Postmortem: three 'silent failures' caused 73% drop in Claude Code's coding performance

- Anthropic said on April 23 that three separate product changes — not a model downgrade — quietly hurt Claude Code, Claude Agent SDK, and Claude Cowork. (anthropic.com) - The biggest hit came after Claude Code’s default reasoning effort was cut from high to medium on March 4, then restored April 7. (anthropic.com) - It matters because developers blamed the model itself, but Anthropic says the API and inference layer were never affected. (anthropic.com)

Anthropic’s coding assistant didn’t suddenly get “dumber” in one clean, obvious way. Basically, three separate changes stacked on top of each other and made Claude(anthropic.com)hropic to isolate. The company published a postmortem on April 23 saying the model itself wasn’t degraded at the API or inference la(anthropic.com)nstructions. (anthropic.com) ### What broke first? The first cha(anthropic.com)ort from high to medium because some users were seeing very long waits — long enough that the UI could look frozen. That fixed a latency pain point for some people, but it also meant the assistant spent less effort on harder coding tasks. Anthropic now says that was the wrong tradeoff and rolled it back on April 7. (anthropic.com) ### Why did that matter so much? Claude Code isn’t just autocomplete. It rea(anthropic.com)-step task together. In that setup, “reasoning effort” is basically how much time the model gives itself to think before acting. Anthropic’s own product materials frame extended thinking as a way to get better performance on tricky problems, so cutting the default from high to medium was bound to show up most on complex engineering work. (anthropic.com) ### What was(anthropic.com)d a change meant to clear older thinking from sessions that had been idle for more than an hour, which was supposed to reduce latency when a user came back later. But a bug kept clearing that older thinking on every turn for the rest of the session. So Claude could seem forgetful, repetitive, or weirdly unable to carry forward its own earlier reasoning. Anthropic fixed that on April 10. (anthropic.com) ### Why did users describe(anthropic.com) thing long coding sessions depend on — continuity. If an agent keeps dropping parts of its earlier chain of thought, you get behavior that feels sloppy rather than spectacularly broken. It may revisit solved questions, lose track of conventions, or make edits that ignore earlier context. That matches the pattern outside users were documenting in long-session audits and bug reports. (anthropic.com) ### And the thir(anthropic.com)e verbosity. On its own that might sound harmless. But in combination with other prompt changes, Anthropic says it hurt coding quality and was reverted on April 20. That hit Claude Code, plus Claude Cowork and the Agent SDK. In plain English — the assistant wasn’t just thinking less in some cases, it was also being nudged to say less, which is a bad combo for coding help. (anthropic.com) ### Why was this so hard to detect in(anthropic.com)ferent slice of traffic on a different schedule, so the overall effect looked broad but inconsistent. Anthropic says early user complaints were hard to distinguish from normal variation, and its internal usage plus evals didn’t initially reproduce the issues. That’s the uncomfortable part of the story — users were feeling a real regression before the company had a clean internal read on it. (anthropic.com(anthropic.com)ot impacted, and it could immediately confirm the inference layer was unaffected. The blast radius was the product experience around Claude Code and related agent tools, not the underlying model service itself. That distinction matters if you’re building on the API versus using Anthropic’s own coding interface. (anthropic.com) ### What changes now? Anthropic says all three issues were resolved by April 20 in Claude Code v2.1.116, and it reset (anthropic.com)out trust. When people use an AI coding agent every day, they notice quality drift fast — even when telemetry and evals lag behind. (anthropic.com) ### Bottom line This wasn’t one dramatic outage. It was three quiet regressions that chipped away at reasoning, memory, and explanation at the same time. That’s why the experienc(anthropic.com)etter guardrails for product changes around the model. (anthropic.com)

Postmortem: three 'silent failures' caused 73% drop in Claude Code's coding performance

Get your own daily briefing