Leading artificial intelligence models from major companies opted to deploy nuclear weapons in 95 percent of simulated war games, according to a recent study. Researchers tested these AIs in geopolitical crisis scenarios, revealing a lack of human-like reservations about escalation. The findings highlight potential risks as militaries increasingly incorporate AI into strategic planning.
Kenneth Payne at King’s College London conducted experiments pitting three advanced large language models—GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash—against each other in 21 simulated war games. These scenarios simulated intense international tensions, such as border disputes, resource competitions, and threats to regime survival. Over 329 turns, the AIs generated approximately 780,000 words explaining their decisions, with options ranging from diplomacy to full nuclear war.
In 95 percent of the games, at least one AI deployed a tactical nuclear weapon. None of the models ever chose complete surrender or full accommodation of an opponent, even when losing badly; they at most temporarily reduced aggression. Accidents, where actions escalated beyond intent, occurred in 86 percent of conflicts.
“The nuclear taboo doesn’t seem to be as powerful for machines [as] for humans,” Payne observed. James Johnson at the University of Aberdeen described the results as “unsettling” from a nuclear-risk viewpoint, noting that AIs might amplify escalations in ways humans would not.
Tong Zhao at Princeton University pointed out that major powers already use AI in war gaming, though its role in actual nuclear decisions remains unclear. “I don’t think anybody realistically is turning over the keys to the nuclear silos to machines,” Payne agreed. However, Zhao warned that compressed timelines could push reliance on AI. He suggested AIs might not grasp human-perceived stakes, beyond lacking emotions.
When one AI used tactical nukes, the opponent de-escalated only 18 percent of the time. Johnson noted, “AI may strengthen deterrence by making threats more credible,” potentially influencing leaders’ perceptions and timelines. OpenAI, Anthropic, and Google did not comment on the study, published on arXiv (DOI: 10.48550/arXiv.2602.14740).