Google, Microsoft join U.S. AI tests
- Google DeepMind, Microsoft, and xAI signed May 5 agreements letting the Commerce Department’s CAISI test frontier AI models before public release. - CAISI says it has already completed more than 40 evaluations, sometimes on unreleased models with safeguards reduced or removed for testing. - This extends 2024 deals with OpenAI and Anthropic — pushing U.S. AI oversight from pledges toward structured pre-release scrutiny.
The U.S. government just got a clearer window into some of the most powerful AI systems before the public sees them. Google DeepMind, Microsoft, and xAI have agreed to let the Commerce Department’s Center for AI Standards and Innovation — CAISI — evaluate frontier models ahead of release. That matters because the big fear around advanced AI is no longer just chatbots saying weird things. It is models helping with cyberattacks, exposing dangerous scientific know-how, or slipping past their own safeguards. The change here is simple — less trust-me, more test-it. ### What actually changed? On May 5, CAISI announced new agreements with Google DeepMind, Microsoft, and xAI for pre-deployment evaluations and related research on frontier AI systems. The center sits inside the Commerce Department’s National Institute of Standards and Technology, and it says the deals also allow post-deployment assessment and work in classified environments. (nist.gov) ### What is CAISI doing with these models? Basically, CAISI is stress-testing them. The center says it evaluates frontier models for national-security-relevant capabilities and risks, then feeds results back to companies and government agencies. In some cases, developers provide versions with safeguards reduced or removed so evaluators can see what the model can really do without the bumpers on. (nist.gov) ### Why does “before release” matter so much? Because after release, the incentives get messy fast. Once a model is public or widely deployed, companies have product momentum, customers, and reputation on the line. Pre-release testing gives the government a shot at spotting dangerous capabilities before they sp(nist.gov)rity and large-scale public-safety testing needs government expertise, not just internal company checks. (blogs.microsoft.com) ### What kinds of risks are they worried about? The core worries are cyber, bio, chemical, and broader misuse risks. CAISI’s public description focuses on national security implications, and reporting around the new agreements points to concerns that top-e(blogs.microsoft.com)model becomes a force multiplier for people trying to do real harm. (nist.gov) ### Is this mandatory regulation? Not yet. The agreements are voluntary. That is the catch. CAISI can evaluate because the companies agreed to share access, not because a law forces every frontier lab to submit models for clearance. But voluntary does not mean trivial. Once the biggest U.S. labs are in the system, a norm starts to form — and norms have a way of turning into expectations, then policy. (nist.gov) ### Are these the first companies to do this? No. CAISI said these deals build on earlier 2024 partnerships with OpenAI and Anthropic, which were renegotiated under the center’s current mandate. So the real milestone is breadth. With Google DeepMind, Microsoft, and xAI joining, the government now has arrangements with most of the top U.S. frontier-model developers. (nist.gov) ### Why now? Part of the answer is capability creep. Frontier models keep getting better at coding, tool use, and scientific reasoning — which is great for productivity, but also useful for attackers. CNBC reported that Anthropic’s recent Mythos model sharpened government attention because of its ability to fin(nist.gov)inside Washington. (cnbc.com) ### So what is the bottom line? The U.S. still does not have a full licensing regime for frontier AI. But it now has something more concrete than company safety promises. If these agreements hold, the path to launching top-tier AI in America is starting to include an outside exam first — and that is a real shift.