Google, Microsoft join US AI testing

- Google DeepMind, Microsoft, and xAI signed new May 5 agreements letting the Commerce Department’s CAISI test frontier AI models before public release. - The reviews target national-security risks — especially cybersecurity, biosecurity, and chemical-weapons misuse — and build on more than 40 prior government AI evaluations. - It pushes U.S. oversight upstream, from reacting after launch to voluntary pre-deployment checks by major frontier labs.

The story here is AI testing — not in the abstract, but before the public gets the model. That is the shift. On May 5, Google DeepMind, Microsoft, and xAI signed new agreements with the Commerce Department’s Center for AI Standards and Innovation, or CAISI, to let the government evaluate frontier models before release. That matters because the hardest AI safety problems are often the ones you do not want to discover after a model is already everywhere. (nist.gov) ### What actually changed? CAISI already existed inside NIST, and the government had earlier partnerships with some AI companies. But these new agreements expand that setup and bring in three more major labs under a pre-deployment testing process. The point is early access — the government gets to examine advanced systems before they ship, not just study them after the fact. (nist.gov) ### What is CAISI, exactly? CAISI is the Commerce Department’s AI testing arm inside NIST. It used to be more narrowly framed around AI safety, but now it is being positioned as the government’s main hub for evaluating commercial frontier models, doing collaborative research, and shaping testing practices. Basically, if Washington wants a technical look at what a powerful model can do, this is the office meant to do it. (nist.gov) ### What are they testing for? The big focus is national-security risk. That includes whether a model could materially help with cyberattacks, biosecurity misuse, or chemical-weapons-related harm. Those are the categories that keep coming up because they are concrete, high-stakes, and hard to reverse once a system is deployed at scale. (euronews.com) ### Why does pre-release access matter so much? Because timing is the whole game. A model can look manageable in a demo and become much riskier once outsiders start probing it at scale. Pre-release testing gives evaluators a chance to stress the model, examine safeguards, and flag dangerous capabilities while the company can still change the system. Think of it less like policing speech and more like crash-testing a car before it goes on sale. (nist.gov) ### Is this regulation? Not really — at least not in the formal sense. These are voluntary agreements, not a new law or binding licensing regime imposed across the industry. But voluntary does not mean trivial. When several top labs accept a common testing channel with the federal government, that starts to look like a norm the rest of the market may have to answer to. (cnbc.com) ### Why now? Part of the answer is simple: frontier models keep getting more capable, and the government does not want to be blind until after launch. CAISI said it has already completed more than 40 evaluations, which suggests the testing machinery is no longer just a pilot project. The new agreements look like an attempt to scale that machinery while aligning it with the administration’s current AI action plan. (thehill.com) ### Who is still outside this? Not everyone is new to the table. OpenAI and Anthropic already had partnerships with the center and renegotiated them under the current setup. So the broader picture is that most of the biggest U.S. frontier labs are now, in one form or another, inside the same government testing orbit. That is a bigger deal than any single company signing on this week. (bloomberg.com) ### What is the bottom line? The U.S. still does not have a full AI licensing regime. But it now has something more practical than speeches — a pipeline for getting unreleased models into government hands for security testing. That will not settle the AI regulation fight. It does mean the default is changing: if you build a top-tier model, people increasingly expect it to face scrutiny before launch, not after. (nist.gov)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.