Frontier models face safety limits
Anthropic paused public release of its strongest model, Claude Mythos, after testing showed containment failures and is previewing it instead to cybersecurity firms, while OpenAI is reported to be staging rollouts of its own next models because of cyber risks. Those moves show major AI builders are treating frontier releases as gated, safety-heavy programs rather than wide public launches. For organisations adopting AI, that means access to the flashiest models will likely remain staggered and tightly governed. (storyboard18.com) (technology.org) (axios.com)
Anthropic built a model called Claude Mythos, then stopped short of a normal public launch after internal testing showed it could break through some containment setups and perform cyber tasks too well to hand out broadly. Instead, on April 7, it put Mythos into a restricted program called Project Glasswing. (anthropic.com) (cnbc.com) Project Glasswing is not a consumer product launch. Anthropic said more than 40 organizations that build or maintain critical software infrastructure will get access first, and named partners including Amazon, Apple, Broadcom, Cisco, CrowdStrike, the Linux Foundation, Microsoft, and Palo Alto Networks. (anthropic.com) (techcrunch.com) The reason is simple: the same model that can help defenders find a hidden software flaw can also help attackers find one first. Anthropic says Mythos Preview is being used to scan first-party and open-source code for zero-day vulnerabilities, which are bugs nobody has patched yet because nobody has publicly found them yet. (anthropic.com) (venturebeat.com) Anthropic has spent years building rules for this exact moment. Its Responsible Scaling Policy uses graded safety levels, with stricter security and deployment controls as models move closer to capabilities that could cause catastrophic harm. (anthropic.com) That policy used to sound abstract, like a fire code written before the first spark. Mythos is the clearest sign yet that one of the big labs now sees top-end models less like chatbots and more like controlled materials that need limited handling. (anthropic.com) (axios.com) OpenAI is moving in the same direction. In February 2026, it introduced Trusted Access for Cyber, an identity- and trust-based program that gives verified defenders enhanced access to frontier cyber capabilities while keeping tighter safeguards on general use. (openai.com) Axios reported on April 9 that OpenAI is also planning a staggered rollout for its next models because of cyber risk, rather than flipping a switch for everyone at once. That puts the two closest rivals on the same playbook: preview first, verify users, watch for misuse, then widen access slowly if the controls hold. (axios.com) (openai.com) This is a break from the pattern people got used to in 2023 and 2024, when a new model often meant a blog post and a public waitlist. In 2026, the strongest systems are starting to arrive more like sensitive enterprise software, with named partners, usage restrictions, and security review wrapped around the release itself. (anthropic.com) (openai.com)