Anthropic restores 'xhigh' reasoning after postmortem finds three 'silent failures' broke Claude Code

- Anthropic said it identified three product changes that degraded Claude Code’s coding performance and reverted the system’s reasoning setting back to “xhigh.” (x.com) - The revert reportedly aims to trade speed for better reasoning to stop UI freezes and restore developer trust while internal docs (CLAUDE.md) and multi‑instance tips circulate to users. (x.com) - Anthropic’s fixes signal a rapid rollback workflow for coding regressions, and the company has circulated short‑term guidance for code prompts and agent instances. (x.com)

Claude Code is Anthropic’s agentic coding tool — the thing that reads a repo, edits files, runs tests, and tries to act like a junior engineer with infinite patience. For weeks, developers said it had gotten worse. Answers felt shallower. Sessions got weirdly forgetful. Anthropic has now said the complaints were real, traced the regression to three separate product changes, and rolled the defaults back toward more reasoning-heavy behavior. (anthropic.com) ### What actually broke? The important detail is that Anthropic says the underlying API and inference layer were not the problem. The damage happened in the product layer around Claude Code, Claude Agent SDK, and Claude Cowork. That matters because it means a coding agent can degrade a lot even when the base model is unchanged — basically, the wrapper can quietly sabotage the model. (anthropic.com) ### Why did users feel Claude got “dumber”? First, Anthropic changed Claude Code’s default reasoning effort on March 4 from high to medium. The goal was to cut latency because some users were seeing long pauses that made the UI look frozen. But the tradeoff was worse coding performance. Anthropic says users preferred higher intelligence by default, so it reverted that change on April 7. The affected models were Sonnet 4.6 and Opus 4.6. (anthropic.com) ### What were the other two failures? The second issue landed on March 26. Anthropic tried to clear older “thinking” from sessions that had been idle for more than an hour so resumed sessions would feel faster. But a bug kept clearing that history on every turn for the rest of the session. That made Claude seem repetitive and forgetful. Anthropic fixed that on April 10. (anthropic.com) The third issue came on April 16, when Anthropic added a system prompt instruction to make Claude less verbose. In combination with other prompt changes, that hurt coding quality too. Anthropic reverted that change on April 20. This one hit Sonnet 4.6, Opus 4.6, and Opus 4.7. (anthropic.com) ### Why call them “silent failures”? Because none of these changes looked like a single obvious outage. They hit different products, different traffic slices, and different dates. So from the outside, users saw something fuzzier — broad, inconsistent degradation. That’s the nasty version of failure for AI products. The system still responds. It just responds worse, and not always in the same way. (anthropic.com) ### Where does “xhigh” fit in? Anthropic’s public postmortem talks about reverting the default from medium back to high. But the bigger lesson is the same one developers have been pushing for: coding agents live or die on reasoning budget. More thinking usually means better output, even if it costs more time and tokens. The “xhigh” discussion around Claude Code is really about that trade — speed versus depth — and Anthropic has clearly moved back toward the depth side after this episode. That last step is an inference from the rollback direction and the surrounding workflow guidance, not a direct quote from the postmortem. (anthropic.com) ### Why are CLAUDE.md and multi-instance workflows suddenly part of the story? Because once people realized the model wasn’t the whole problem, attention shifted to harness design. Anthropic’s own materials keep emphasizing structured project memory, long-running agent harnesses, and ways to manage context across sessions. CLAUDE.md is the practical version of that idea — a project memory file that reduces drift. Running multiple instances in separate checkouts is another workaround for the same constraint: one agent can lose the thread, so teams build workflows that compartmentalize work. (anthropic.com) ### Why does this matter beyond Anthropic? Because this is the clearest recent example of how fragile agent products still are. Not broken in the dramatic sense — broken in the trust sense. Developers thought Claude Code had been nerfed. Anthropic initially couldn’t reproduce the problem with internal usage or evals. Then it found three separate regressions and said users were right to complain. That gap is the real story. If you build on coding agents, model quality is only part of the stack. The defaults, prompts, memory behavior, and latency hacks are part of the model now too. (anthropic.com) ### Bottom line? Anthropic didn’t discover one catastrophic bug. It discovered that three “reasonable” product tweaks stacked into a worse coding agent. That’s why the rollback matters. It’s not just a settings change — it’s a reminder that with AI coding tools, product plumbing can matter as much as the model. (anthropic.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.