Opus 4.6 'Nerf' Signals
Users and benchmarkers report a sharp drop in Claude Opus 4.6's factual accuracy on hallucination-focused tests — BridgeBench retests put correct hallucination answers at 68.3% down from 83.3% previously, a noticeable fall in rank from #2 to #10. The re-test results and follow-up confirmations from LMSYS have been circulated on social channels as evidence of degraded reasoning performance after release, and users also link the change to faster token consumption and stricter retroactive limits on Pro/Max plans (x.com) (x.com).
A benchmark that checks whether an artificial intelligence model invents facts is now driving new complaints about Claude Opus 4.6. BridgeMind said a fresh BridgeBench run put the model at 68.3% on correct hallucination answers, down from 83.3% a week earlier. (ai-primer.com) BridgeMind’s post said that drop moved Opus 4.6 from No. 2 to No. 10 on its leaderboard. The same post said hallucinations rose by 98% in the retest. (ai-primer.com) Hallucination tests measure whether a model answers with made-up details instead of saying it does not know. Anthropic has long treated that problem as a product issue, saying Claude 2.1 in November 2023 delivered “significant reductions” in hallucination rates. (anthropic.com) Anthropic introduced Claude Opus 4.6 on February 5, 2026 and said it was its “smartest model,” with leading results in coding, tool use, search, and finance. The company’s product page says Opus 4.6 is available to Pro, Max, Team, and Enterprise users. (anthropic.com 1) (anthropic.com 2) Anthropic’s consumer pricing page says Pro costs $20 a month and Max starts at $100 a month, with Max offering 5 times or 20 times more usage than Pro. The same page says “usage limits apply,” which is the language users have pointed to while arguing that access has tightened after launch. (claude.com) The complaints are spreading through social posts and developer forums rather than through a published Anthropic changelog. LMArena, the platform formerly known as LMSYS, describes itself as an open platform for evaluating artificial intelligence systems through human preference, and BridgeMind has cited follow-up confirmations from that community in its posts. (lmarena.ai) (bridgemind.ai) Public LMArena leaderboards show how volatile model standings can be across categories and dates, with rankings updating as new votes come in. That makes any single retest a snapshot, but it also means developers who buy a named model can see different behavior over time without a new model name. (lmarena.ai) Anthropic has not published a public post on April 13, 2026 explaining any Opus 4.6 rollback, safety retune, or inference-speed change. Until the company comments, the central claim in circulation is narrower: outside benchmarkers say the model now answers this hallucination test less accurately than it did days earlier. (anthropic.com)