Anthropic’s Mythos paused over risk
Anthropic’s new Claude Mythos model appears to be a capability leap — leading many benchmarks and scoring very high on coding and cybersecurity tasks — but the company is deliberately limiting its release after finding it can identify and exploit real vulnerabilities. That capability raises two problems at once: coding agents are about to get much better at repo‑wide reasoning, and models with higher exploitability force tighter sandboxing, permission controls and deployment policy. The restraint highlights how frontier capability and operational risk are colliding in real time. ( ).
Anthropic did something unusual on April 7: it announced a new flagship model and then refused to release it to the public. In its own system card, the company says Claude Mythos Preview showed such a large jump in capability that it will be limited to a defensive security program instead of general availability. (anthropic.com) The reason is not that the model writes prettier code. The reason is that Anthropic says Mythos can find and exploit real software flaws, including previously unknown “zero-day” bugs, across major operating systems and web browsers. (red.anthropic.com) A zero-day bug is a hidden crack in software that defenders do not know exists yet. If an attacker finds that crack first, they can sometimes run their own code on someone else’s machine before a patch is ready. (red.anthropic.com) Anthropic says Mythos did not just point at suspicious code. Its red-team report says the model could identify a bug, build a working exploit, and in some cases do it autonomously after a single prompt. (red.anthropic.com, forbes.com) The examples are the part that made people stop scrolling. Anthropic says Mythos found vulnerabilities in every major operating system and every major web browser it tested, and one disclosed case involved a now-patched 27-year-old bug in OpenBSD, a system with a long security-focused reputation. (red.anthropic.com) This is also a coding story, not just a hacking story. On SWE-bench Verified, a benchmark built from real GitHub issues in open-source Python repositories, Mythos scored 93.9%, which Anthropic and public benchmark trackers show as a leading result. (forbes.com, benchlm.ai) That score matters because modern codebases are not single files; they are cities of files, tests, dependencies, and old decisions. A model that can keep the whole map in its head gets better at fixing bugs, but it also gets better at spotting where one small mistake unlocks the rest of the system. (anthropic.com, red.anthropic.com) Anthropic’s answer is Project Glasswing, a restricted program announced the same day as Mythos. The launch partners include Amazon Web Services, Apple, Cisco, CrowdStrike, Google, JPMorganChase, Microsoft, NVIDIA, Palo Alto Networks, and the Linux Foundation, with access also extended to more than 40 other organizations that maintain critical software infrastructure. (anthropic.com) Anthropic says it is putting up to $100 million in usage credits and $4 million in donations behind that effort. The idea is to give defenders an early look at tools that can scan, reproduce, and patch weaknesses before similar capabilities spread more widely. (anthropic.com) The company’s own safety documents make clear that the problem is no longer just “can the model code.” The problem is where the model runs, what systems it can touch, what permissions it has, and how much monitoring stands between a useful assistant and an automated exploit generator. (anthropic.com, anthropic.com) That is why this launch feels different from a normal benchmark win. Anthropic is treating capability as something that now changes deployment policy in real time: tighter access, named partners, defensive use only, and no public release until the guardrails catch up. (anthropic.com, anthropic.com) The short version is that the industry just got a preview of the next phase of artificial intelligence coding tools. They are becoming good enough to read giant codebases like senior engineers and dangerous enough that the model release itself now looks more like handling a sensitive security tool than shipping a chatbot. (anthropic.com, red.anthropic.com)