Governments vet frontier models pre‑release
- The U.S. Commerce Department’s CAISI signed new pre-release AI testing deals Tuesday with Google DeepMind, Microsoft, and xAI, expanding earlier OpenAI and Anthropic arrangements. (nist.gov) - CAISI says it has already completed more than 40 evaluations, including on unreleased models, with testing that can include stripped-down safeguards and classified environments. (nist.gov) - This matters because pre-deployment review is shifting from ad hoc lab policy to a government-backed gatekeeping layer for frontier-model launches. (nist.gov)
Frontier AI policy just got more concrete. On May 5, the U.S. government said CAISI — the Commerce Department’s AI testing arm inside NIST — signed new agreements with Google DeepMind, Micros(nist.gov)s get early access to frontier systems, “voluntary testing” starts to look a lot like a soft launch approval process. And that is a real shift in who gets to see the most capable models first. (nist.gov) ### What happened today? CAISI announced the new deals on Tuesday, May 5. The agency said the agreements let it run pre-deployment evalua(nist.gov)lso noted that OpenAI and Anthropic already had earlier partnerships that were renegotiated to fit the administration’s current AI plan. (nist.gov) ### What is CAISI actually doing? Basically, CAISI is becoming the federal government’s main technical checkpoint for commercial frontier AI. NIST says the center is now the primary point of contact inside the U.S. government for testing, collaborative res(nist.gov)ted more than 40 evaluations, including on unreleased models. (nist.gov) ### What does “pre-release vetting” mean here? It means developers hand over access before launch so government evaluators can probe what the model can do in sensitive domains. NIST says those tests can involve m(nist.gov)t the polished public version. The agreements also support testing in classified environments and input from an interagency national-security task force. (nist.gov) ### Is this totally new? Not really. The U.K. has been doing a version of this since 2023, and the Seoul Summit commitments in 2024 pushed labs to assess severe ris(nist.gov)t companies to sign voluntary agreements with both U.S. and U.K. safety bodies, and described concrete red-teaming work with CAISI and the U.K. AISI. (gov.uk) ### Why are people worried about it? Because the line between “evaluation” and “permission” is thin. If governments get used to seeing frontier models first, labs may start treating offici(nist.gov) more easily than smaller rivals. A pre-release review regime can improve safety, but it can also harden incumbents’ advantage if the criteria stay vague. This last point is an inference from how compliance systems usually work, not something NIST says outright. (nist.gov) ### Why now? The government’s own framing is nation(gov.uk)e domains. Earlier joint U.S.-U.K. testing on OpenAI’s o1 model shows the kind of work this can involve — cyber, biological, and software-development capability evaluations before release. (nist.gov) ### So what changes next? The immediate change is practical — more major labs are now inside the same federal testing channel. The bigger change is political. Pre-deployment review is no longer just a safety-lab norm or a summit pledge. It is becoming standing government infrastructure, with named agencies, signed agreements, and a growing expectation of early access. (nist.gov) ### Bottom line? Governments are not fully licensing frontier models yet. But they are moving closer to a world where the strongest systems reach state evaluators before they reach everyone else — and once that habit sets in, it will be hard to unwind. (nist.gov)