Washington weighs AI pre‑release vetting

- NIST said Google DeepMind, Microsoft, and xAI will hand frontier models to its CAISI unit for testing before release, as Washington weighs broader mandates. - CAISI says it has already run more than 40 model evaluations, including some unreleased systems, while White House aides compare the idea to FDA-style review. - That would be a real shift from voluntary AI safety work toward gatekeeping for models seen as cyber or national-security risks.

AI policy in Washington is starting to look less like abstract principles and more like a checkpoint. This week, NIST said Google DeepMind, Microsoft, and xAI agreed to let the government test frontier models before release, and White House officials openly discussed an executive order that could formalize that kind of review. The big deal is simple. For the last couple of years, U.S. AI oversight has mostly meant voluntary promises, red-teaming, and safety frameworks. Now the conversation is shifting toward something closer to pre-release clearance for the most capable systems — especially the ones that could help with cyberattacks or other national-security problems. ### What actually changed? On May 5, NIST announced that Google DeepMind, Microsoft, and xAI signed agreements with its Center for AI Standards and Innovation, or CAISI. (thehill.com) Those deals let CAISI run “pre-deployment evaluations” and follow-up research on the companies’ frontier models. OpenAI and Anthropic had already signed similar agreements in 2024, so this is less a brand-new program than a rapid expansion of one that suddenly matters a lot more. (nist.gov) ### What is CAISI? CAISI is the Commerce Department’s testing shop for advanced AI. Its job is to measure model capabilities, probe security weaknesses, and build evaluation methods that other parts of government can use. NIST says the center focuses on risks tied to cybersecurity, biosecurity, chemical weapons, foreign influence, and broader national-security concerns. ### Why is the White House suddenly interested? Because frontier models are getting good enough to make the downside feel immediate. (thehill.com) Kevin Hassett, who runs the National Economic Council, said the administration is studying an executive order so future AI systems that could create vulnerabilities go through a process before they are “released in the wild.” He framed it like drug review — not because AI works like medicine, but because the White House wants a safety gate before public launch. (nist.gov) ### What spooked them? A lot of this turns on fears that a strong model could speed up offensive cyber work. Federal News Network tied the latest push to Anthropic’s “Mythos” disclosure, which raised alarms about AI systems finding and exploiting software flaws faster than defenders can patch them. That kind of capability changes the policy mood fast — because the harm is not theoretical if a model can hand attackers a workflow. (federalnewsnetwork.com) ### Is this already mandatory? Not yet. The current agreements are voluntary. But they create the plumbing for something stricter later — shared testing channels, government access to unreleased models, and a norm that frontier developers should submit systems for review. The Hill also noted that CAISI has already completed more than 40 evaluations, including some on models that had not launched yet, so the machinery is not hypothetical. (federalnewsnetwork.com) ### What does the government think it can measure? Quite a lot, turns out. CAISI recently published an evaluation of DeepSeek V4 Pro using benchmark suites across cyber, software engineering, science, reasoning, and math. The point was not just leaderboard bragging. It showed CAISI is trying to build a repeatable way to compare capability levels, identify risky strengths, and use some non-public tests that are harder for model makers to game. (thehill.com) ### Why does this matter for AI companies? Because pre-release vetting changes who controls the launch button. A voluntary safety framework lets companies decide when evidence is good enough. A government review system — even a narrow one for frontier models — means delays, documentation, and probably fights over what counts as “safe.” It also cuts against the Trump administration’s earlier light-touch, pro-innovation posture on AI. (nist.gov) ### Bottom line Washington has not built an FDA for AI. But it is clearly testing the idea. And once companies start handing over unreleased models for government evaluation, the distance between “voluntary testing” and “you need clearance first” gets a lot shorter. (thehill.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.