Pentagon tests OpenAI, Google models

- Bloomberg reported on May 21 that the Pentagon is testing OpenAI and Google models with 25 designated users as it searches for alternatives. - The most revealing detail is the “25 power users” trial, which puts model choice inside a controlled Defense Department user evaluation. - The next step is operational use in government-owned classified environments, which Pentagon officials said is being engineered now.

The Pentagon’s latest AI test is less about a single benchmark than about how the Defense Department buys and deploys models when one supplier is no longer enough. Bloomberg reported on May 21 that the U.S. Department of Defense is testing OpenAI and Google models with 25 designated “power users” as it looks for alternatives to Anthropic’s Claude, citing a senior defense official. That trial sits inside a broader Pentagon push to bring multiple large language models into government-controlled systems rather than rely on one commercial provider. In March, Cameron Stanley, the Pentagon’s chief digital and AI officer, told Bloomberg, in comments later reported by TechCrunch, that the department was “actively pursuing multiple LLMs into the appropriate government-owned environments” and that engineering work had already begun. (bloomberg.com) The immediate backdrop is the Pentagon’s split with Anthropic. DefenseScoop reported on May 1 that the Defense Department signed formal agreements with eight technology companies — OpenAI, Google, Microsoft, Amazon Web Services, Oracle, NVIDIA, SpaceX and Reflection — to deploy frontier AI capabilities on classified networks for “lawful operational use,” while Anthropic was excluded after a contract dispute earlier this year. (techcrunch.com) ### Why is the Pentagon running a 25-user model test instead of just picking a vendor? The 25-user setup suggests the department is comparing how different models perform in actual defense workflows, not just in public benchmark tables. Bloomberg said the Pentagon is testing which models are most favored by 25 of the department’s “power users.” That matters because the users are being asked to work with the systems inside a procurement process aimed at sensitive government needs, according to the report. (defensescoop.com) Stanley’s March remarks point the same way. He said the department was pursuing multiple LLMs for government-owned environments and expected them to be available for operational use “very soon,” indicating the evaluation is tied to deployment planning, not a research exercise. ### What changed with Anthropic? Anthropic had been an early model supplier to the Pentagon, but the relationship broke down over military-use terms, according to multiple reports. (bloomberg.com) TechCrunch, citing Bloomberg’s interview with Stanley, reported on March 17 that Anthropic’s $200 million Defense Department contract unraveled after the two sides failed to agree on how much unrestricted access the military would have to Anthropic’s AI. (techcrunch.com) The same report said Anthropic had sought contractual limits barring uses such as mass surveillance of Americans or weapons firing without human intervention. DefenseScoop reported on May 1 that the Pentagon’s new classified-network agreements followed that dispute and were meant to expand access to other frontier AI providers. Federal News Network separately reported in March that the Pentagon had 180 days to remove Anthropic products from its systems and require defense vendors to certify they were not using Claude in Defense Department work. (techcrunch.com) ### What does “government-owned environments” mean here? The Pentagon is trying to move commercial frontier models into classified systems it controls, with security requirements that go beyond ordinary enterprise cloud use. DefenseScoop reported that the May 1 agreements cover deployment into Impact Level 6 and Impact Level 7 environments, the Defense Department’s cloud classifications for secret and top-secret or highly sensitive workloads. (defensescoop.com) The publication said Pentagon officials described that as a way to support warfighting, intelligence and enterprise operations. DoD cloud guidance separately says Impact Level 6 is required for secret workloads and that future enterprise cloud capabilities at secret and top-secret levels are to use the JWCC contract vehicle where available. That gives some context for why the department is testing several model suppliers at once: any model that wins users still has to fit classified infrastructure and procurement rules. (defensescoop.com) ### Why does Google show up alongside OpenAI? Google is already part of the Pentagon’s classified AI supplier list, and the current test indicates the Defense Department wants more than one credible option. The May 1 DefenseScoop report named Google and OpenAI among the eight companies with formal Pentagon agreements for classified-network AI deployments. Bloomberg’s May 21 report then showed both companies inside the 25-user evaluation as the department searched for replacements for Claude. (dodcio.defense.gov) That does not mean a final winner has been chosen. What it shows is that the Defense Department is now evaluating frontier models as interchangeable but not identical components — systems that must satisfy users, fit classified environments and survive procurement scrutiny at the same time. That last step is still ahead: Stanley said in March that engineering work was underway to make the models available for operational use in government-owned environments. (defensescoop.com) (techcrunch.com)

Pentagon tests OpenAI, Google models

Get your own daily briefing