White House to test AI models

- Google DeepMind, Microsoft, and xAI signed new Commerce Department agreements on May 5 letting CAISI test frontier AI models before public release. - CAISI says it has already completed more than 40 evaluations, including unreleased models, and can test versions with safeguards reduced or removed. - That marks a real shift from Trump’s lighter-touch posture as Mythos-style cyber risks push Washington back toward frontier-model oversight.

The U.S. government is moving closer to something AI labs have mostly avoided in America — letting Washington inspect top-end models before the public sees them. On May 5, the Commerce Department’s Center for AI Standards and Innovation, or CAISI, announced agreements with Google DeepMind, Microsoft, and xAI to run pre-deployment evaluations on frontier systems. At the same time, the White House is weighing a broader working group that could turn this from a set of voluntary deals into a more formal review pipeline. The reason is simple: officials are getting more worried that the newest models can do real national-security damage, especially in cyber. (nist.gov) ### What changed today? CAISI, which sits inside the Commerce Department’s NIST, said the new agreements let it evaluate models before they are publicly available and do follow-on research after deployment. That means the government is no longer just talking abstractly about AI risk — it now has fresh access deals with three majo(nist.gov) by executive order, to explore broader oversight procedures around model releases. (nist.gov) ### What is CAISI actually doing? Basically, CAISI is becoming the federal government’s testing hub for advanced commercial AI. NIST says it has already completed more than 40 evaluations, including on state-of-the-art systems that were still unreleased at the time of testing. The agency also says developers often provide version(nist.gov)nments with participation from experts across government. (nist.gov) ### Why now? The immediate backdrop is Anthropic’s Mythos model. Multiple outlets say Washington’s posture shifted after officials focused on the possibility that new models could sharply improve offensive cyber capabilities. Reuters says officials are especially alarmed by hacking-related capabilities, while CNBC says Anthropic (nist.gov)ion — stronger capability, tighter release, louder alarm — seems to have pushed the White House toward more hands-on scrutiny. (cnbc.com) ### Is this mandatory regulation? Not yet. The agreements announced Tuesday are voluntary collaborations, not a licensing regime. But the catch is that voluntary testing can still become a de facto gate if the White House layers on an executive-order process, especially for models headed into government use or carrying obvious security implications. CNBC says the possi(cnbc.com) oversight procedures, including vetting before release. (nist.gov) ### How different is this from the Biden-era setup? There’s more continuity here than the politics suggest. Politico and Reuters both note that CAISI’s new deals build on 2024 arrangements with OpenAI and Anthropic from the Biden era, though NIST says those partnerships were renegotiated to reflect Trump administration priorities(nist.gov)ting safety infrastructure and steering it harder toward national-security review. (nist.gov) ### Why do the companies agree to this? Because the alternative could be worse. If labs can show they are cooperating on evaluations, sharing test methods, and fixing problems before launch, they may shape the rules instead of having them imposed later. Microsoft has already pointed to a similar arrangement with the UK’s AI Security Institute, which suggests big labs increasingly see pre-release testing as part of the cost of operating at the frontier. (politico.com) ### What’s the real stakes question? The fight is over who gets to decide when a model is too capable to ship casually. For years, U.S. AI policy bounced between voluntary promises and broad rhetoric. Now the government is building a mechanism — still soft-edged, but real — for peeking inside the most powerful systems before release. If that expands, model launches could start looking less like product drops and more like security clearances. (nist.gov) ### Bottom line This is the clearest sign yet that Washington wants a seat in the launch process for frontier AI. Not every model will face a federal check. But for the systems that look most useful to hackers, militaries, or intelligence services, the era of “ship first, evaluate later” is starting to close. (nist.gov)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.