US tests DeepMind and Microsoft models

- U.S. Commerce Department testers added Google DeepMind, Microsoft, and xAI to a federal program that gets early access to frontier AI models before release. - The program sits inside NIST’s CAISI center and the TRAINS taskforce, which now spans more than 10 agencies focused on national-security model risks. - That pushes frontier models closer to critical infrastructure treatment, not ordinary software sold and updated in public. (nist.gov)

The thing that changed here is simple, but the implication is bigger than it looks. On May 5, the Commerce Department’s AI testing arm said Google DeepMind, Microsoft, and xAI will now let the U.S. government examine frontier models before those models are deployed. That means federal evaluators are getting a look at systems while they are still effectively behind the curtain. And that is a different relationship from ordinary software oversight. (nist.gov) ### What actually got announced? NIST’s Center for AI Standards and Innovation — CAISI — said it signed new agreements with Google DeepMind, Microsoft, and xAI for “pre-deployment evaluations” and targeted research on frontier AI capabilities. In plain English, the government gets earlier access to unreleased or not-yet-broadly-deployed systems so it can probe dangerous capabilities before the public does. (nist.gov)s through CAISI, which replaced the U.S. AI Safety Institute as Commerce’s main AI testing hub in June 2025. The national-security side also runs through the TRAINS taskforce — short for Testing Risks of AI for National Security — which NIST says now includes participants from more than 10 federal agencies. So this is not one lab kicking tires. It is a cross-government testing setup. (nist.gov)orried about? Not just chatbots saying weird things. TRAINS was built to test risks tied to chemical and biological misuse, radiological and nuclear issues, cybersecurity, critical infrastructure, and military-relevant capabilities. That list tells you how Washington is framing frontier models now — less like consumer apps, more like systems that could shift real-world security conditions if they are misused or misaligned. (ni([nist.gov)# Why do unreleased models matter so much? Because once a model is public, the easy part is over. Developers can patch, rate-limit, and moderate, but the core capability is already out in the world. Pre-deployment testing is the AI version of inspecting a bridge before traffic starts using it — not after the first cracks show up. The government is trying to move its line of sight upstream, into the development cycle itself. That is the real story. (nist.gov) ### Is this mandatory regulation? Not in the classic sense. NIST’s AI risk framework is voluntary, and these testing agreements are collaborations, not a public rulebook with fines attached. But voluntary does not mean trivial. If the companies building the biggest models are routinely giving federal testers early access, that starts to create an expected operating norm for the frontier tier of the market. (nist.gov)ly because this is an expansion, not a brand-new idea. Back in August 2024, the U.S. AI Safety Institute signed similar research and testing agreements with Anthropic and OpenAI. The new deals pull in three more major frontier-model developers and widen the government’s field of view at a moment when model capabilities are moving fast and policy has shifted toward national competitiveness as well as safety. (nist.gov) ### What does this change for everyone downstream? If you build on top of frontier models, your “vendor” is starting to look less like a normal software supplier and more like a piece of regulated strategic infrastructure. That does not mean full utility-style control. But it does mean model access, deployment timing, and safety gating may increasingly be shaped by government testing and national-security concerns, not just product roadmaps. (nist.gov([nist.gov)ttom line? The U.S. government is no longer waiting for the biggest AI systems to ship before it studies them. It is trying to get inside the release pipeline itself. Once that becomes normal, frontier AI stops looking like just another software category — and starts looking like infrastructure the state wants eyes on before anyone flips the switch. (nist.gov)

US tests DeepMind and Microsoft models

Get your own daily briefing