Anthropic Delays Model
- Anthropic delayed releasing an AI model because it was particularly effective at finding and exploiting software bugs. (x.com) - The company postponed the model after discovering it could uncover unpatched flaws, raising safety and security concerns. (x.com) - The episode highlights how advanced models can both surface vulnerabilities and complicate safe deployment practices. (x.com)
Software bugs are mistakes in code that can crash a program or open a door to attackers. Anthropic said it held back public release of Claude Mythos Preview after tests showed the model was unusually strong at finding and exploiting those flaws. (anthropic.com) Anthropic announced Mythos Preview and its related Project Glasswing on April 7, 2026. The company said the model would go first to a limited group that includes Amazon Web Services, Apple, Cisco, Google, Microsoft, NVIDIA and more than 40 additional organizations that maintain critical software. (anthropic.com) In a technical assessment published the same day, Anthropic said Mythos Preview could identify and exploit zero-day vulnerabilities — previously undiscovered bugs — in every major operating system and every major web browser during testing. The company said more than 99% of the vulnerabilities it found were still unpatched, so it withheld most details. (red.anthropic.com) Anthropic also said the model could reverse-engineer exploits from closed-source software and turn N-day vulnerabilities — known bugs that have not been widely patched yet — into working attacks. Its example of age and depth was a now-patched OpenBSD flaw that the company said had survived for 27 years. (red.anthropic.com) The company framed the delay as part of its Responsible Scaling Policy, a rulebook it updates as models get more capable. In the current version, updated April 2, 2026, Anthropic says it may pause development or deployment whenever it thinks extra safeguards are needed. (anthropic.com) That posture has put Mythos in a narrow lane: not a normal product launch, but a controlled rollout to defenders. Anthropic said it is committing up to $100 million in usage credits and $4 million in donations to open-source security groups through Glasswing. (anthropic.com) The restricted release has not stopped outside pressure. Bloomberg reported on April 21 that a small group of unauthorized users accessed Mythos, a model Anthropic itself had described as capable of enabling dangerous cyberattacks. (bloomberg.com) At the same time, Axios reported on April 19 that the National Security Agency was using Mythos Preview despite Pentagon objections to Anthropic. That split showed how a model built to help defenders can also become a tool governments want quickly. (axios.com) The last big U.S. example of a major model being held back on safety grounds came in 2019, when OpenAI staged GPT-2’s release instead of publishing the full system at once. Anthropic is now making a similar argument in a different domain: not spam or fake text, but software exploitation. (openai.com) For now, Anthropic is keeping Mythos behind a gate while it tries to use the model to patch code faster than attackers can weaponize it. The company’s own testing suggests that balance, not benchmark scores, is what will decide when — or whether — the wider public gets access. (red.anthropic.com)