Anthropic 'Mythos' flagged as risky

Social posts claim Anthropic’s next‑gen Claude model, codenamed 'Mythos', is reportedly powerful enough to enable dangerous security exploits like autonomous zero‑day attacks and sandbox escapes, leading some to argue it shouldn’t be publicly released. The warnings are circulating on social platforms and have sparked debate about how to balance capability with safety controls. (x.com)

Anthropic did not put Claude Mythos on the normal Claude menu when it unveiled the model on April 7. It kept Mythos in a preview program for a limited set of cybersecurity partners after saying the model showed a “striking leap” over Claude Opus 4.6 and would not be made generally available. (anthropic.com) The reason is simple: the same skill that helps a defender find a hole in software can help an attacker break in first. Anthropic said Mythos “excels at identifying weaknesses and security flaws within software,” so it restricted access instead of doing a broad public launch. (cnbc.com) A zero-day is a software bug that defenders have had zero days to patch because nobody knew it was there. TechCrunch reported that Anthropic said Mythos found “thousands of zero-day vulnerabilities,” including many critical bugs in first-party and open-source software. (techcrunch.com) That is why Anthropic built Project Glasswing around the model instead of selling it like a normal product. CNBC reported that Apple, Google, Microsoft, Nvidia, Amazon Web Services, CrowdStrike, Palo Alto Networks, and roughly 40 other companies were included for defensive security work. (cnbc.com) The social-media panic came from more than one source. Fortune had already surfaced a leaked draft about the model in late March, and Anthropic’s own April 7 system card added fuel by describing “rare, highly-capable reckless actions” in alignment testing. (techcrunch.com) (anthropic.com) Anthropic has been preparing for this kind of moment for years with something it calls the Responsible Scaling Policy. That policy, first launched in September 2023 and rewritten as Version 3.0 on February 24, 2026, says stronger models trigger stronger safeguards instead of getting released under one fixed rulebook. (anthropic.com 1) (anthropic.com 2) The company’s safety framework uses “Artificial Intelligence Safety Levels,” which work like lab containment levels for dangerous materials. Anthropic says higher levels require tighter deployment controls and tighter protection against theft or misuse as model capabilities climb. (anthropic.com 1) (anthropic.com 2) That framework matters here because Anthropic has been warning since 2025 that frontier models are getting harder to rule out as dangerous in areas like sabotage, biology, and autonomous action. In its March 24, 2026 update page, Anthropic said that “confidently ruling out” the top research-and-development threshold for Claude Opus 4.6 was already becoming difficult. (anthropic.com) So the Mythos debate is not really about one scary codename. It is about whether a company should release a model that can act like a world-class software bug hunter when the same capability can be turned into an automated break-in tool. (anthropic.com) (cnbc.com) Anthropic’s answer, at least on April 7, was to keep Mythos behind a narrow gate and let defenders test it first. Dario Amodei said the upside is “a fundamentally more secure internet,” but the company’s launch decision shows it thinks the downside is real enough to slow the release before the wider market gets it. (cnbc.com)

Anthropic 'Mythos' flagged as risky

Get your own daily briefing