U.S. gains early access to AI models

- On May 5, NIST’s CAISI signed new deals with Google DeepMind, Microsoft, and xAI to test frontier AI models before public release. - The program now covers five major U.S. labs, with OpenAI and Anthropic already inside, and CAISI says it has completed 40-plus evaluations. - That shifts oversight earlier — from reacting after launch to probing cyber and national-security risks before powerful models ship.

The U.S. government is getting a look at major AI models before the public does. That is the news. On May 5, the Commerce Department’s AI testing arm — the Center for AI Standards and Innovation, or CAISI — said Google DeepMind, Microsoft, and xAI will now let federal evaluators examine frontier models before release. (nist.gov) Why does that matter? Because the hardest part of AI oversight has been timing. Once a model is public, the capabilities are already out in the world. If a system is unusually good at hacking, helping with dangerous biological workflows, or dodging safeguards, regulators are mostly playing catch-up. CAISI is trying to move the inspection point earlier. (nist.gov) ### What is CAISI, exactly? CAISI sits inside NIST at the Commerce Department. Its job is not classic product approval — this is not the FDA for AI. Basically, it is the government’s measurement-and-testing shop for advanced commercial AI, with a specific national-security brief. NIST says CAISI (nist.gov)ystems. (nist.gov) ### What changed this week? Three more companies joined formal pre-release testing agreements: Google DeepMind, Microsoft, and xAI. Those agreements let CAISI run pre-deployment evaluations, post-deployment assessments, and related research. This builds on earlier 2024 agreements with OpenAI and Anthropic, which already gave the government access to major new models before and after launch. (nist.gov) ### So how big is the program now? Big enough that it covers the core U.S. frontier-model group. With the new deals, CAISI now has formal arrangements with five major labs — OpenAI, Anthropic, Google DeepMind, Microsoft, and xAI. NIST says it has already completed more than 40 evaluations, including tests on unreleased state-of-the-art models. That means this is not a pilot anymore. It is becoming routine infrastructure. (nist.gov) ### What are government testers looking for? The focus is national-security-relevant capability. NIST lists cybersecurity, biosecurity, chemical-weapons risk, security vulnerabilities, and even the possibility of backdoors or covert malicious behavior in foreign AI systems as part of CAISI’s remi(nist.gov)rong hands. (nist.gov) ### How do these tests actually work? We have one concrete example from late 2024, when the U.S. and U.K. AI Safety Institutes tested OpenAI’s o1 before release. They ran cyber, biological, and software-development evaluations using question answering, agent tasks in virtual environments, and expert qualitative probing. Findings went back to OpenAI before launch. That gives a pretty g(nist.gov)ling up now. (nist.gov) ### Why do labs agree to this? Partly because the arrangement is voluntary and collaborative, not a formal licensing regime. NIST says the deals support information-sharing and product improvements. There is also a practical reason — if government testers are going to evaluate models anyway, companies would rather do it in a controlled channel before a public controversy forces the issue. (nist.gov) ### What is the catch? Early access is not the same thing as a legal veto. CAISI can test, compare, and warn, but these agreements do not automatically give the government power to block release. So the shift is real, but it is softer than it sounds. Think less “licensing gate” and more “official(nist.gov)ething they cannot ignore. (nist.gov) The bottom line is simple. Washington is building a habit of seeing powerful AI before everyone else does. That does not solve the governance problem. But it does move the government from spectator to early examiner — and for frontier models, timing is half the battle.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.