Anthropic’s Safety Pause

Anthropic quietly kept a new model private after finding thousands of external vulnerabilities and launched an internal patching effort dubbed Project Glasswing before broader release. That decision highlights how model capabilities can create cascading cybersecurity risks that vendors must manage before public rollout. The episode underscores that production-ready AI demands not just accuracy but robust external security and remediation processes. (artificialintelligence-news.com)

Anthropic built a new model that was good enough at finding software holes that it decided not to release it to the public at all. Instead, on April 7, the company put it behind a restricted program called Project Glasswing and handed access only to selected defenders. (anthropic.com) The model is called Claude Mythos Preview, and Anthropic says it can find and exploit security flaws across “every major operating system and web browser.” In Anthropic’s telling, this was not a chatbot getting better at code suggestions; it was a system crossing into high-end bug hunting. (anthropic.com) A software vulnerability is a mistake in code that acts like an unlocked window in a house. If an attacker finds that window first, they can steal data, shut down systems, or move deeper into a network. (anthropic.com) Bug hunting usually takes skilled researchers weeks or months because modern software has millions of lines of code. Anthropic says Mythos Preview found thousands of high-severity vulnerabilities, including flaws that had survived years of prior review. (anthropic.com) That is why Anthropic did not do a normal launch. CNBC reported that Dianne Penn, Anthropic’s head of research product management, said there was “a lot of internal deliberation” before the company limited the rollout. (cnbc.com) Project Glasswing is the compromise Anthropic chose. Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks are the named launch partners using the model for defensive work instead of public experimentation. (anthropic.com) Anthropic says more than 40 additional organizations that maintain critical software infrastructure are also getting access. The company is committing up to $100 million in usage credits and $4 million in direct donations to open-source security groups to help patch what the model finds. (anthropic.com) The backstory is awkward for Anthropic too. In late March, Fortune found references to Mythos in a publicly accessible data cache, and Anthropic later said a human error in its content management system had exposed draft material about the unreleased model. (tech.yahoo.com) That leak mattered because it showed Anthropic was already describing Mythos internally as a “step change” in capability before the formal announcement. By April 8, NBC News reported that Anthropic researchers were saying the model had detected thousands of high- and critical-severity bugs, including some that may have sat undiscovered for decades. (tech.yahoo.com) (nbcnews.com) Axios called this one of the first clear cases of an artificial intelligence company holding back a model over broad societal risk instead of shipping first and adding rules later. That is a different kind of safety pause: not about rude answers or hallucinations, but about whether a model can accelerate real-world break-ins faster than defenders can patch them. (axios.com) The immediate question is not whether Mythos is “good” or “bad.” The question is whether companies can build a patching pipeline fast enough when one model can surface thousands of dangerous flaws across the same digital plumbing that banks, hospitals, browsers, and cloud services all share. (anthropic.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.