Anthropic holds back a new model

Anthropic said it is withholding release of a new AI model because it judges the model too capable to publish safely at this time. The company’s decision illustrates how capability concerns are shaping release plans among model vendors. (rte.ie)

Anthropic said on April 8 that it is not releasing its newest model, Claude Mythos Preview, to the public because the system can find and exploit software flaws too effectively. (anthropic.com) Instead, the company put the model into a restricted program called Project Glasswing with a small group of partners, including Amazon Web Services, Apple, Cisco, Google, JPMorganChase, Microsoft, Nvidia and Palo Alto Networks. (anthropic.com) Anthropic said those companies and open-source software maintainers will use Mythos Preview to look for “zero-day” bugs, meaning security holes that attackers could exploit before a fix exists. (anthropic.com) The decision follows Anthropic’s updated Responsible Scaling Policy, published on February 24 and revised again on April 2, which says the company may restrict deployment or pause work when a model crosses dangerous capability thresholds. (anthropic.com) Anthropic’s policy sorts models into “AI Safety Levels,” a ladder for systems with growing catastrophic-risk potential; the company’s 2026 version says stronger safeguards or deployment limits should kick in as models approach higher-risk categories. (anthropic.com) In its Mythos system card, Anthropic said the model showed unusually strong performance on cyber tasks and published a separate risk report on April 10 examining whether it could take harmful autonomous actions. (anthropic.com) Anthropic said the model is being released first to defenders because similar capabilities are likely to become broadly available later, and it wants critical software providers to patch important systems before attackers get comparable tools. (anthropic.com) That puts Anthropic’s release plan in the middle of a wider fight in artificial intelligence over whether frontier labs should ship stronger models quickly, gate them behind limited access, or hold them back when misuse risk rises. (techcrunch.com) Anthropic has argued for that kind of staged rollout before. When it first published its Responsible Scaling Policy in September 2023, the company said it would not train or deploy models capable of catastrophic harm without matching safety and security measures. (anthropic.com) For now, the model Anthropic calls its newest frontier system is staying behind a gate, with access tied to bug hunting rather than a public chatbot launch. (anthropic.com)

Anthropic holds back a new model

Get your own daily briefing