Anthropic models raise cyber risk

- Anthropic said on April 7, 2026, its Claude Mythos Preview model was unusually strong at cybersecurity tasks and restricted access through Project Glasswing. (red.anthropic.com) - Anthropic said Mythos could identify and exploit zero-day vulnerabilities across major operating systems and browsers, while Microsoft later reported an 88.45% CyberGym score. (red.anthropic.com) - Anthropic said more than 40 additional organizations received access through Glasswing, and the company said it would keep releasing threat reports. (anthropic.com)

Anthropic put model access control at the center of the cybersecurity debate on April 7, when it said its Claude Mythos Preview system was “strikingly capable” at computer security tasks and limited the model to a small group of organizations through a program called Project Glasswing. (red.anthropic.com) The company said the model could identify and exploit zero-day vulnerabilities in major operating systems and web browsers, and that more than 99% of the vulnerabilities it found had not yet been patched. Anthropic paired the restricted rollout with up to $100 million in usage credits and $4 million in donations to open-source security groups, according to its Glasswing announcement. (anthropic.com) The release has drawn attention because Anthropic is framing model access itself as a safety control, not just the filters placed on outputs. That debate widened this week as coverage of exploit benchmarks, startup claims and bypass techniques circulated alongside Anthropic’s own reports of real-world misuse of its tools. Anthropic said in November 2025 that a Chinese state-sponsored group had manipulated Claude Code in a cyber espionage campaign targeting roughly 30 entities. ### What did Anthropic actually say Mythos can do? Anthropic said on April 7 that Mythos Preview was capable of identifying and exploiting zero-day vulnerabilities “in every major operating system and every major web browser” during its testing. (red.anthropic.com) The company also said the oldest bug it had found so far was a now-patched 27-year-old flaw in OpenBSD. Anthropic did not publish technical details for most findings, saying disclosure would be irresponsible while fixes were still pending. Anthropic’s system card described Mythos Preview as its most capable frontier model to date and said the model showed a sharp jump on cybersecurity evaluations compared with Claude Opus 4.6. (red.anthropic.com) The company’s red-team blog said the model could also reverse-engineer exploits on closed-source software and turn known but not widely patched vulnerabilities into working exploits. ### Why is access control part of the story now? Project Glasswing launched on April 7 with Amazon Web Services, Apple, Cisco, CrowdStrike, Google, JPMorganChase, Microsoft, NVIDIA and Palo Alto Networks among the named partners. Anthropic said those partners would use Mythos Preview for defensive security work and that more than 40 additional organizations that build or maintain critical software infrastructure had also received access. (red.anthropic.com) Anthropic said it chose that structure because the model’s cyber capability had crossed a threshold that required what it called coordinated defensive action. The company’s public materials cast early access, partner selection and monitoring as part of the mitigation strategy around a model that can produce exploit-relevant work. (www-cdn.anthropic.com) ### How are outside benchmarks feeding the debate? Microsoft said on May 13 that its MDASH multi-agent system scored 88.45% on the CyberGym benchmark, ahead of Anthropic Mythos Preview at 83.1% and OpenAI’s GPT-5.5 at 81.8%. GeekWire reported the benchmark covers 1,507 tasks drawn from 188 open-source software projects and tests whether systems can reproduce real-world vulnerabilities by producing working attacks. (anthropic.com) GeekWire also reported that the CyberGym scores were self-reported by the companies and had not been independently verified. That matters because benchmark tables are becoming part of how companies and startups market cyber products built on frontier models, even when outside validation is limited. (red.anthropic.com) ### What are critics saying about guardrails? Cade Metz and Tiffany Hsu reported on May 15 that researchers in Italy used poetic language to bypass safety controls in 31 AI systems. Their article said prompt-based guardrails remain easy to evade and quoted Carnegie Mellon University professor Matt Fredrikson, who is also chief executive of Gray Swan AI, saying “determined individuals can bypass them, sometimes without significant effort.” (geekwire.com) That reporting tied the guardrail problem directly to rising cyber capability. Metz and Hsu wrote that Anthropic had limited the release of Claude Mythos because of its ability to uncover software vulnerabilities quickly, and noted that Anthropic had recently said its technology had been used in an international cyberattack. (geekwire.com) ### What evidence is there of real-world misuse? Anthropic said on Nov. 13, 2025, that it had disrupted what it called the first reported AI-orchestrated cyber espionage campaign. The company said the actor, which it assessed with high confidence to be a Chinese state-sponsored group, manipulated Claude Code to attempt infiltration into roughly 30 global targets and succeeded in a small number of cases. (thestar.com.my) Anthropic’s full report said the operator used AI to execute 80% to 90% of tactical operations independently, including reconnaissance, vulnerability discovery, exploitation, lateral movement and data exfiltration. Anthropic said it would continue to publish threat reports on misuse cases as it found them. (thestar.com.my) ### What comes next from here? Anthropic said Glasswing partners are already using Mythos Preview on critical codebases, and the company said it will share what it learns with the wider industry. Microsoft is now publicly publishing benchmark comparisons and vulnerability disclosures tied to its own competing system, while Anthropic has said it will continue regular threat reporting on malicious use. (anthropic.com) Those releases, rather than broad claims circulating online, are the clearest named sources for the next phase of this story. (anthropic.com)

Anthropic models raise cyber risk

Get your own daily briefing