Model releases now a gate
Coverage this week highlights that frontier‑model safety is increasingly being treated as a deployment gate inside companies, with some advanced systems judged too risky for public release and staged access becoming the norm. ( )
Frontier artificial intelligence companies are increasingly treating safety checks as a release gate, not a research side project. Some models now clear internal benchmarks but still do not clear public launch. (openai.com) Anthropic made that explicit on April 7, 2026, when it introduced Claude Mythos Preview and said it would not make the model generally available. The company instead limited access to launch partners and more than 40 additional organizations working on critical software security. (anthropic.com, anthropic.com) Anthropic said Mythos Preview is its “most capable frontier model to date,” but its system card says the model’s capability jump led the company to keep it out of broad release. The restricted rollout sits inside Project Glasswing, which includes Amazon Web Services, Apple, Cisco, Google, Microsoft, NVIDIA, JPMorganChase, and others, plus up to $100 million in usage credits and $4 million in donations to open-source security groups. (anthropic.com, anthropic.com) The basic idea is simple: a frontier model is a general-purpose system at the leading edge of capability, and companies now test whether it can cause severe harm before they decide who gets access. OpenAI’s updated Preparedness Framework, published April 15, 2025, says it is meant to track advanced capabilities that could introduce “severe harm” and to define what it means to minimize those risks in practice. (openai.com) Anthropic has built the same logic into policy. Its Responsible Scaling Policy, updated to version 3.1 on April 2, 2026, says the company can pause development when it deems that appropriate, and an earlier version assigns a Responsible Scaling Officer to approve model training or deployment decisions based on capability and safeguard assessments. (anthropic.com, anthropic.com) Google DeepMind has moved in the same direction. On September 22, 2025, it published version 3 of its Frontier Safety Framework, adding risk thresholds for harmful manipulation and for systems that could accelerate artificial intelligence research and development to destabilizing levels. (deepmind.google) OpenAI has also formalized launch oversight above the product team. In an update published September 25, 2024, the company said its Safety and Security Committee and full board have authority to delay a model release until safety concerns are addressed. (openai.com) That does not mean companies have stopped shipping powerful models; it means they are increasingly staging access. OpenAI’s open-model page shows it now offers open-weight models such as gpt-oss in 120 billion and 20 billion parameter versions, alongside a separate “gpt-oss-safeguard” line built for custom safety policies. (openai.com) Outside researchers have been tracking the pattern across labs. A December 9, 2025 review from Model Evaluation and Threat Research said frontier developers have adopted corporate protocols that tie model evaluations to deployment safeguards, information security measures, and accountability practices. (metr.org) The shift is visible in what companies publish with each launch: system cards, risk thresholds, board-review processes, and named partner programs. The new question is no longer only whether a model works, but whether a company will let everyone use it at once. (anthropic.com, openai.com, deepmind.google)