Wargames and AI: A dangerous mix that needs ethical oversight

By Ivanka Barzashka, December 4, 2023

US Naval Postgraduate School students participate in analytic wargames they designed to explore solutions for some of Department of Defense's most pressing national security concerns. Credit: Javier Chagoya, Public domain, via Wikimedia Commons

In early November, world leaders assembled for the first global Artificial Intelligence (AI) Safety Summit at Bletchley Park, the once top-secret British site where codebreaking technology helped secure victory in World War II. The summit aimed to understand risks from frontier AI (highly capable general-purpose models that can perform a wide variety of tasks), particularly when used by “bad actors,” and galvanize international action.

Missing from the summit’s agenda was AI’s use by state actors for national security applications, which could soon transform geopolitics and warfare. Killer robots aren’t necessarily the biggest risk. Instead, AI systems could sift through data to identify competitive advantages, generate new adversary strategies, and evaluate the conditions under which wars can be won or lost. This can be achieved via the fusion of AI with wargames—defined by NATO as “representations of conflict or competition in a safe-to-fail environment, in which people make decisions and respond to the consequences of those decisions.” A centuries-old art, wargaming is only now emerging as a science and an academic discipline.

AI’s integration into wargames can subtly influence leadership decisions on war and peace—and possibly lead to existential risks. The current landscape of human-centric wargaming, combined with AI algorithms, faces a notable “black box” challenge, where the reasoning behind certain outcomes remains unclear. This obscurity, alongside potential biases in AI training data and wargame design, highlights the urgent need for ethical governance and accountability in this evolving domain. Exploring these issues can shed light on the imperative for responsible oversight in the merging of AI with wargaming, a fusion that could decide future conflicts.

Influence without oversight. Wargaming has exploded in popularity: NATO member states, think tanks, and universities are using these tools to examine a range of security issues—from nuclear crises to great power competition. Some wargames seek to educate participants, while others collect data for analysis to inform scholarly theory or government policy.

The revival began in 2015 when the Pentagon called for more wargaming to out-compete major rivals like Russia and China. Now, NATO is developing an “audacious” wargaming capability—a culture shift that encourages critical thinking, experimentation, and cross-pollination of ideas in military strategy and planning to gain strategic advantage. Leading institutions like King’s College London and Stanford University have also established new research centers in this field.

As a result of the revival, wargames have a growing influence on Western leaders. As the UK Defence Secretary Ben Wallace highlighted in July 2023, “Wargame outputs have been central to [the Ministry of Defence’s] decision-making.” For example, the Secretary of State’s Office of Net Assessment and Challenge has been conducting extensive wargaming, informed by defense intelligence and independent expertise, to ensure current and emerging strategies are thoroughly tested before they are implemented.

In the United States, wargaming is even more prevalent, as the Pentagon habitually uses such simulations to “prepare for actual warfare.” For instance, Hedgemony, developed by the RAND Corporation, was a strategic wargame that played a key role in shaping the Pentagon’s 2018 National Defense Strategy. The game simulated the trade-offs in resource and force management guiding US defense professionals in aligning military capabilities with evolving national strategies and objectives in a dynamic global security environment. RAND, a federally funded research and development center, has been working on wargaming since the late 1940s.

AI can accelerate scientific advance, but the real bottlenecks to progress are cultural and institutional

Yet, oversight hasn’t kept pace. In a 2023 King’s College London survey I led, we polled more than 140 wargame designers from 19 countries. The results were concerning: 80 percent of the analytical wargames skipped ethics reviews, ignoring the standard process for research studies that involve human participants. This trend is also reflected in data from the UK Ministry of Defence: According to information obtained via a Freedom of Information Act request, only one study was submitted for research ethics committee review between 2018 and 2023.

Why has wargaming lacked ethics oversight? First, influential guidance, like NATO’s wargaming handbook released this year, fail to outline ethics requirements, even though these games inform real-world decisions. Government sponsors also seldom mandate formal compliance with research ethics standards. Moreover, securing ethical approval can be arduous and time-consuming, conflicting with pressing policy timetables.

The next frontier: Fusing AI and wargaming. Ethical challenges multiply as wargaming embraces AI. Companies and government agencies like the United States’ Defense Advanced Research Projects Agency (DARPA) and the United Kingdom’s Defence Science and Technology Laboratory are spearheading experimental projects on AI-wargaming integration. Notably, the RAND Corporation has toyed with such fusion since the 1980s.

The promises are compelling. A 2023 study from the Alan Turing Institute, United Kingdom’s top AI hub, found this merger could increase speed and efficiency and improve analysis. AI could rapidly uncover insights from vast data. Players could experience more immersive games with AI-generated scenarios and adversarial strategies. The expected result? A transformative leap in foresight and strategic advantage over competitors.

However, both wargames and AI models share two challenges—lack of explainability (difficulties in comprehending how knowledge is produced) and bias, which raise ethical concerns. Wargames are “not reproducible,” according to NATO and UK’s Ministry of Defence wargaming guidance. When combined with black-box deep learning models—systems where the decision-making process is opaque and not readily interpretable—trust in outcomes diminishes further. Biases in both can arise from limited data or flawed design, potentially leading to erroneous conclusions. Additionally, wargame methods and insights are often classified. Turbocharging them with AI can propagate errors with significant real-world consequences free from public scrutiny.

Compromising ethical principles. Wargames can carry risks that, without ethical guardrails, could damage players and society.

In realistic games, participants can experience high stress levels, sometimes leading to aggressive behavior similar to the dynamics seen in competitive sports. Also, if player identities can be linked to their game actions and discussions, this could damage people’s professional reputations and even jeopardize their safety. Ethical games—like proper research studies—avoid such pitfalls through careful protocols, such as informed consent and data anonymization.

More broadly, strategic wargames can have both indirect and direct influences on real-world decisions. Players who are or will become real-world decision-makers could be primed by their gaming experiences, possibly affecting future decisions in subtle ways. This is like having a medical trial participant, who had an adverse reaction to a drug, decide on the drug’s approval.

Why nobody can see inside AI’s black box

To illustrate potential issues, consider a recent university-based wargame that involved NATO staff and uniformed military exploring a Russian invasion of Finland, as reported in The Guardian. If this game were sponsored by an entity like NATO for strategic insights, its outcomes could guide immediate policy or military choices. For instance, if the Russian leadership is unintentionally portrayed as overly aggressive due to hidden biases in the game design or scenario, this could lead to misallocation of defense resources or inadvertent conflict escalation.

Of course, such consequential decisions are unlikely to be made based on the results of a one-off game, but many games with large numbers of players can exacerbate risks. Scale matters.

Now consider a digital AI-powered version of an analytical game deployed at a massive scale. AI risks amplifying existing biases by producing volumes of skewed data that could falsely validate a hypothesis. AI could also craft remarkably persuasive but deceptive narratives that further blur the line between simulation and reality. Ironically, in the eyes of decision-makers, these data-driven insights could add undue credibility to otherwise questionable results.

If wargaming continues to be pivotal in defense decisions, as stated by former UK Defence Secretary Wallace, leaders might view wars as more necessary and winnable than they are in reality. Biased or unexplainable AI-powered games can exaggerate chances of victory or misrepresent adversaries’ intent, priming decision-makers to believe war is essential when diplomatic options remain. This could compromise the ethical principles of just war theory, such as just cause and last resort.

Governing AI wargaming responsibly. Integrating AI’s analytical power with wargaming’s human creativity promises strategic advantage to deter or win future wars. But ethical standards, accountability, and oversight are needed to reap these benefits.

First, experts must develop ethical guidelines for both traditional and high-tech wargames, adapting research standards to account for risks specific to games. These standards must become a cornerstone in government guidelines. Organizations like NATO can provide forums to share best practices, avoiding duplicated efforts.

Second, the challenges of explainability and inherent biases in AI must be addressed through investment in fundamental research. While research on AI explainability gains momentum, few scholars are working on wargaming methodology and epistemology. Multidisciplinary collaboration is needed. Computer scientists should work together with wargaming scholars and practitioners to advance theory.

Third, institutions that conduct and sponsor games must provide oversight. This requires senior leadership buy-in. If games subtly influence defense decisions free of public scrutiny, this may require additional checks and balances, such as reviews from legislative bodies.

Just as machines cracked enemy codes at Bletchley Park to win the war, AI will soon unravel complex strategies to secure peace. Gatherings, such as the AI Safety Summit, can catalyze dialogue and reforms to embed ethical governance into wargaming’s digital future.

As the coronavirus crisis shows, we need science now more than ever.

The Bulletin elevates expert voices above the noise. But as an independent, nonprofit media organization, our operations depend on the support of readers like you. Help us continue to deliver quality journalism that holds leaders accountable. Your support of our work at any level is important. In return, we promise our coverage will be understandable, influential, vigilant, solution-oriented, and fair-minded. Together we can make a difference.

Support the Bulletin