US agency to safety-test frontier AI

- Google DeepMind, Microsoft, and xAI signed new May 5 agreements letting Commerce’s CAISI test frontier AI models before public release for security risks. - CAISI says it has already completed more than 40 evaluations, including unreleased models, and can test systems in classified settings with safeguards reduced. - The bigger shift is federal pre-release AI oversight moving from ad hoc pledges toward a standing review channel.

Artificial intelligence policy just got more concrete. On May 5, the Commerce Department’s Center for AI Standards and Innovation — CAISI — said Google DeepMind, Microsoft, and xAI will give the U.S. government access to frontier models before those systems go public. The point is not general product review. It is national-security testing — the kind of work that asks whether a model can meaningfully help with cyberattacks, bio risks, or other dangerous capabilities. (nist.gov) ### What is CAISI, exactly? CAISI sits inside NIST, the Commerce Department’s standards arm. It is being positioned as the government’s main contact point for testing commercial AI systems, coordinating research, and developing best practices around advanced models. That matters because AI oversight in the U.S. has often looked scattered — lots of voluntary promises, fewer durable institutions. (nist.gov) ### What changed this week? The new piece is pre-deployment access from three more major labs: Google DeepMind, Microsoft, and xAI. CAISI said these deals expand and renegotiate earlier partnerships to fit current administration priorities and America’s AI Action Plan. In plain English, the government is trying to move from “please tell us what you built” to “let us inspect it before everyone else gets it.” (nist.gov) ### What does “testing” mean here? It means CAISI can evaluate models before release, keep evaluating them after release, and run targeted research on what the systems can do at their limits. The agency says developers often provide versions with safeguards reduced or removed so evaluators can see the model’s raw capabil(nist.gov)nderlying model is much more capable. (nist.gov) ### Why do classified environments matter? Because some of the most sensitive tests cannot happen on an ordinary corporate demo server. CAISI says the agreements support testing in classified environments and let evaluators from across government participate through its TRAINS task force. Basically, this is built for national-security review, not just academic benchmarking. (nist.gov) ### How big is this program already? Bigger than the announcement makes it sound. CAISI says it has already completed more than 40 evaluations, including on state-of-the-art models that have not been released publicly. That suggests this is not a pilot anymore. It is turning into a standing process, even if the companies are still participating voluntarily. (nist.gov) ### Is this mandatory? Not yet, at least from what is public. The current agreements are voluntary. But the direction of travel is obvious: the White House is weighing a more formal government review process for advanced models before deployment. So the likely story here is voluntary first, then potentially a firmer framework later if officials decide the risks justify it. That last step is still a proposal, not policy. (nextgov.com) ### Why are officials pushing now? Because frontier models are crossing from “impressive software” into “dual-use infrastructure.” The same system that writes code faster can also help find security flaws faster. The same model that summarizes biology papers can lower the barrier to dangerous biological know-how. CAISI has already been testing those edges, and officials clearly want earlier visibility before a lab ships something stronger. (nist.gov) ### What is the bottom line? The U.S. still does not have a full licensing regime for frontier AI. But it now has something more practical than broad principles — a government channel that gets early access, runs adversarial tests, and feeds findings back before release. That is a real shift in how Washington is trying to govern advanced AI: less speechmaking, more inspection. (nist.gov)

US agency to safety-test frontier AI

Get your own daily briefing