Claude showed a measurable 'nerf' in public logs
An AMD senior AI director analysed session logs for Anthropic’s Claude and reported that median 'thinking' length fell from ~2,200 to ~600 characters while API requests surged about 80× and self‑contradictions tripled, prompting users to adopt a '/effort max' workaround. The thread highlights a measurable shift in model behavior under operational load. (x.com)
Claude Code users published a log-based case that the tool got markedly shorter on “thinking” and less reliable in complex coding sessions in late February and March. (github.com) The core dataset came from 6,852 Claude Code session files, 17,871 thinking blocks, and 234,760 tool calls collected from January 30 to April 1, 2026, then filed as GitHub issue #42796 by Stella Laurenzo, whose profile identified her with Advanced Micro Devices. (github.com, theregister.com) A separate write-up of the issue said median visible thinking length fell about 67%, from roughly 2,200 characters to about 600, while monthly application programming interface requests rose from 1,498 to 119,341 and estimated monthly Amazon Web Services Bedrock cost rose from $345 to $42,121. (tokencost.app) In Anthropic’s system, “extended thinking” is extra scratchpad time for harder tasks, like giving a programmer more minutes before touching the code. Anthropic said in February 2025 that developers can set a “thinking budget” to control how long Claude spends reasoning on a problem. (anthropic.com) The complaint matters to software teams because Claude Code is not a chatbot for one-off answers; Anthropic markets it as a terminal-based coding agent that reads files, edits code, and handles git workflows inside real repositories. (github.com) The GitHub report said the model’s behavior shifted from research-first to edit-first. Before March 8, the logs showed 6.6 reads per edit; after March 8, that fell to 2.0, while edits without a prior read rose from 6.2% to 33.7%. (github.com, tokencost.app) Other failure signals in the same analysis moved the same way. Full-file rewrites rose from 4.9% to 11.1%, user interrupts per 1,000 tool calls rose from 0.9 to 11.4, and the analyzer tracked more premature stopping, reasoning loops, and self-admitted errors. (tokencost.app, github.com) Anthropic has publicly acknowledged past quality drops, but it has framed them as infrastructure bugs rather than load-based throttling. In a September 17, 2025 postmortem, the company wrote, “We never reduce model quality due to demand, time of day, or server load.” (anthropic.com) That leaves the current dispute centered on cause, not just symptoms. The GitHub issue tied the regression to the rollout of thinking redaction in Claude Code, while Anthropic’s earlier public position was that degraded quality should be investigated as bugs in serving or infrastructure. (github.com, anthropic.com) Users responded with workarounds that try to force more reasoning back into the session. Anthropic’s own newer model cards describe evaluations run with “max effort” or “high effort” settings, which matches the community habit of adding prompts such as “/effort max” when Claude starts acting terse. (www-cdn.anthropic.com, anthropic.com) The thread that set this off was not a vibe check. It was a claim that one model behavior change showed up first in the logs, then in the code, and only later in public complaints. (x.com, github.com)