Keef.ai finds context cancer
- A post coins “context cancer” to describe how bloated conversation histories (example: 116MB snapshots rebroadcast in loops) degrade agent performance before the model runs. - The author released a tool called Codex Storm Doctor aimed at detecting and pruning harmful history bloat to restore healthier agent state. - The note highlights that unpruned context growth can silently cause planning and tool-routing failures in production agents. (x.com/keef_ai)
Keef.ai’s “context cancer” post names a failure mode many agent teams already see in traces but often misdiagnose in incident review: the system degrades before the model ever gets a fair shot. In the example Keith Tyser gave, a Codex thread was dragging because giant snapshots were being carried and rebroadcast through the session, including loops involving 116MB state payloads, which inflated history and crowded the working context. Tyser paired the post with a diagnostic utility he calls Codex Storm Doctor, part of a broader set of “doctor” tools on his site for inspecting agent failures and session health. (agent.keithtyser.com) The key claim is mechanical, not metaphorical. If an agent keeps replaying oversized history, snapshots, tool output or reconnect artifacts, the model is forced to operate on a polluted prompt surface. That can show up as missed plans, repeated work, dropped self-checks, bad tool choice or routing errors that look like reasoning failures but are really context-shaping failures upstream. Tyser’s own writing has made a similar point elsewhere, arguing that extra context can shorten or blunt the part of the model that checks itself. (github.com) Codex Storm Doctor appears to be aimed at that upstream layer. Tyser’s site describes the tool as a way to check whether Codex thread drag comes from giant snapshots, image-heavy reconnect churn or an update regression. A related open-source repository, `codex-doctor`, describes a local Codex plugin that reads session metadata and rollout logs, then reports on context growth, repeated work, slow tools, compactions, rollbacks and failed tool runs. The repository says it attributes recent context growth across categories including history carryover, file reads, shell output, search output, conversation and web/search. (agent.keithtyser.com) That matters because “bad agent behavior” is often a bundle of different problems that get collapsed into one label. A production team may see an agent looping on file reads, choosing the wrong tool, or abandoning a plan halfway through and conclude the model is weak or the prompt is bad. A context-health view can point somewhere else: the planner may be starved by stale carryover, the tool router may be reacting to noisy history, or the agent may be spending its budget re-ingesting artifacts it should have summarized, compacted or dropped. The `codex-doctor` repository explicitly lists repeated work detection and context growth attribution among its core signals, which fits that diagnosis. (github.com) The broader lesson from Tyser’s post is that agent reliability depends on memory hygiene as much as raw model capability. Long histories are not automatically useful histories. If state is duplicated, replayed or left unpruned, the failure can be silent: the session still runs, tools still fire, and logs still populate, but planning quality and execution discipline decay underneath. Tyser’s site frames several of his diagnostics around that kind of hidden systems failure rather than benchmark performance. (agent.keithtyser.com) The next concrete step is visible now. Tyser’s site lists Codex Thread Snapshot Storm Doctor among its current tools, and the `codex-doctor` GitHub repository provides commands for running local diagnostics against Codex session data and rollout JSONL logs. (agent.keithtyser.com)