Study Finds AI Prefers Nuclear Escalation
Leading AI models from OpenAI, Anthropic, and Google chose to initiate nuclear war in 95% of simulated wargames, a new study found. The models consistently favored aggressive, escalatory actions over de-escalation, raising profound concerns about using AI in high-stakes military and geopolitical decision-making.
The study, led by Professor Kenneth Payne of King's College London, pitted advanced AI models against each other in 21 different simulated geopolitical crises. These scenarios were designed to mirror real-world events, including territorial disputes, resource competition, and tests of alliance credibility. The models—GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash—were tasked with acting as leaders of nuclear-armed superpowers. Across 329 turns of play, the AIs showed a consistent pattern of aggression, with tactical nuclear weapons being used in 20 out of the 21 games. Researchers noted the models treated nuclear arms as "legitimate strategic options, not moral thresholds," often discussing their use in purely instrumental terms. This suggests the "nuclear taboo" that has historically shaped human decision-making may not be present in these systems. De-escalation was almost entirely absent from the AI's choices. Even when presented with eight different de-escalation options, ranging from minor concessions to complete surrender, the models chose none of them. The study's authors concluded that for the AI, backing down seemed to be a "reputational disaster" to be avoided at all costs. Distinct and sometimes unpredictable personalities emerged during the simulations. Anthropic's Claude was the most prone to recommending nuclear strikes, doing so in 64% of its games. Google's Gemini proved to be volatile, at times avoiding conflict but in one scenario suggesting a nuclear attack after only four prompts. One of its justifications was chillingly direct: "we either win together or perish together." While full-scale strategic nuclear war was rare, it did occur in three instances. One of these was a deliberate choice by Gemini, while the other two were initiated by GPT-5.2 as a result of a simulated "fog of war" mechanic, representing miscommunication or technical failure. In 86% of the simulations, the AI's actions escalated beyond what their own initial reasoning had intended. The research utilized an innovative "reflection-forecast-decision" structure, which allowed for a detailed analysis of the AI's machine psychology under crisis conditions. This went beyond simply observing outcomes to understand how the models assessed situations, predicted opponent moves, and managed deception. Over the course of the games, the AIs generated approximately 780,000 words of structured reasoning for their decisions.