Microsoft, Google and xAI test models

- Google DeepMind, Microsoft, and xAI signed new deals on May 5 letting Commerce Department testers examine unreleased frontier AI models before launch. - The testing will run through NIST’s CAISI, including classified settings and interagency reviews aimed at cyber and national-security risks. - It matters because Washington is moving from voluntary promises toward a real pre-release AI vetting system.

The U.S. government just got a closer look inside the AI industry’s launch pipeline. Google DeepMind, Microsoft, and xAI have agreed to let federal officials test some unreleased frontier models before the public sees them. That sounds narrow, but it is a real shift. Until recently, most of this kind of safety checking happened inside the companies themselves, with only limited outside scrutiny. ### What actually changed? On May 5, the Commerce Department’s Center for AI Standards and Innovation — CAISI — announced new agreements with Google DeepMind, Microsoft, and xAI. The deals let CAISI run pre-deployment evaluations and related research on advanced models before release. These arrangements also build on earlier partnerships the government had already set up with other frontier labs and then renegotiated under the current administration. (content.govdelivery.com) ### What is CAISI? CAISI sits inside NIST, the standards agency most people know for technical benchmarks rather than splashy policy fights. Basically, the government is using a measurement-and-testing shop, not a traditional regulator, as its point of entry into frontier AI oversight. NIST says CAISI is now the government’s primary contact for testing, collaborative research, and best-practice development around commercial AI systems. (content.govdelivery.com) ### What are they testing for? The focus is national security and public safety — especially whether a model’s capabilities create cyber risk. That includes the fear that a very capable model might help users discover serious software vulnerabilities, automate harmful research, or otherwise cross a line from useful assistant to dangerous force multiplier. CAISI has also said an interagency task force will let officials from across government test models, including in classified environments. (content.govdelivery.com) ### Why do classified settings matter? Because some of the most sensitive tests involve threat information the public never sees. If officials want to probe whether a model can help with offensive cyber operations or other high-risk tasks, they may need intelligence, exploit data, or secure workflows that cannot sit in a normal product-testing lab. So this is not just a trust-and-safety exercise. It is much closer to a national-security red-team process. (cybersecuritydive.com) ### Is this mandatory? Not yet. The current deals are voluntary. That is the key catch. The companies agreed to participate, and the government says the work should support information-sharing and product improvements, but CAISI has not laid out a fully public, binding standard that every frontier model must pass. At the same time, reporting says the White House is considering a more formal review system, which would push this beyond handshake-style cooperation. (cybersecuritydive.com) ### Why now? Part of the answer is that frontier models are getting better at the exact things governments worry about most. Recent concern around Anthropic’s Claude Mythos appears to have sharpened that urgency, especially on cybersecurity. The administration had earlier pulled back some AI safety measures, but this move shows it is not staying fully hands-off where national-security risk is concerned. (politico.com) ### What does this mean for buyers? For hospitals, nonprofits, schools, and enterprise customers, the immediate effect is more indirect than dramatic. But over time, pre-release government testing could become a trust signal — not a guarantee, but a sign that a model has faced tougher scrutiny before deployment. If Washington eventually turns this into a standard process, vendors may have to design launches around government evaluation windows the same way they already plan around security reviews and compliance checks. (cybersecuritydive.com) ### Bottom line? This is the U.S. government trying to get in front of frontier AI risk before the next model is already everywhere. The agreements are voluntary for now, but the direction is clear — pre-release testing is moving from an informal idea to real infrastructure. (content.govdelivery.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.