In April of 2023, technology company Palantir released a demo of a large language model (LLM)-enabled battle management software called Artificial Intelligence Platform (AIP) for Defense. The platform links together interactive AI-enabled chat-based functionality with seemingly perfect intelligence collection and query. This is paired with course of action generation capabilities for military command decision-making.
Through the course of the demo, the platform notifies a military operator of an enemy formation. Using a chat window, the operator requests and receives more detailed imagery by sending a drone to retrieve video footage, identifying an enemy T-80 battle tank. The operator then asks the platform to generate several possible courses of action, the results of which are sent to a higher command level for further analysis. The commander then picks from the options laid out on the platform’s chat window, and based on compiled and comprehensive geo-spatial intelligence, the Palantir system generates the best route to engage the enemy. The commander then quickly decides to disrupt the adversary’s comms to protect advancing friendly units. In the demo, the software can “automatically” identify the relevant communications nodes and disrupt the adversary’s means to effectively communicate by deploying available jammers. Finally, after reviewing a summary of the operational plan, the commander issues orders, enemy comms are jammed, and forces are tasked to destroy the enemy tank. To borrow sociologist James Gibson’s phrasing, it is truly a “perfect war.”
In the demonstration, the confusion typical of real-life wars is absent and the chaos of battle is managed. The enemy appears as an empty canvas for the platform to enact its capabilities. War with the Artificial Intelligence Platform software on your side, then, looks easy and efficient. Or as Alex Karp, the CEO of Palantir Technologies, characterized the company’s platform it’s “a weapon that will allow you to win.”
What does not exist within the confines of the Palantir demo is an enemy with any agency at all, a contingency in which the “information environment” is not completely dominated by the operator of the Artificial Intelligence Platform, or consideration that data used to train the underlying system functionality might be messy, disrupted, or incomplete, or reflect problematic biases. Ingrained into the demo is the assumption of a pristine environment with perfect information and technological performance and an adversary that simply accepts those circumstances. It is a hollow view of war.
While a good case study, Palantir’s platform is not simply a one-off case. There are ongoing discussions about how AI will be used for military planning and command decision-making more generally. The Department of Defense continues to pursue what is known as Joint All-Domain Command and Control, which, at least in part, hopes to integrate artificial intelligence and machine learning into the United States’ command decision-making processes. Moreover, the US Army is working with Scale AI, a data provider and annotation company, and its platform known as Donovan to experiment with how large language models might be able to assist with Joint All-Domain Command and Control. According to one official, “large language models are critical to our Corps’ vision of data centric warfare.”
In the context of these ongoing developments and putting aside for the moment the fundamental ethical questions surrounding AI’s place in the domain of war, all suggestions for how AI might be integrated into military decision-making should embed the risks of disruption and deception as central to any operational system. To their credit, Marine Corps University professor Ben Jensen and Dan Tadross of Scale AI do point out some of these issues in their recent discussionof using large language models for military planning. However, the dominant model for how AI will link into war should not be the faultless visualization offered in platforms such as Palantir’s. Defense officials, policymakers, and the general public should be wary of such a pristine picture of how technology could transform military conflict.
An enemy with a vote. “The enemy gets a vote” is a common adage, used to ward off notions that any military conflict will go exactly as planned. The basic idea is that even the best laid out operational designs are subject to disruption and unexpected outcomes in the face of adversary forces. Yet, within some emerging perspectives of large language model-enabled platforms in war, there is a distinct lack of capabilities on the part of the imagined enemy, either in terms of putting up resistance within the so-called information domain or with respect to possibilities for deception.
In terms of who has agency in conflict, at least within the context of Palantir’s Artificial Intelligence Platform demo, only one side gets to act, employing electronic jamming technology, benefiting from sensors and intelligence-fusion capabilities linked to the software that appear as a sort of sovereign or external observer above the battlefield. It is a confrontation against an adversary whose forces remain as stagnant orange blocks on the screen. Accordingly, significant questions quickly emerge. For example, what if the broader linked intelligence collection system is disrupted? Or suppose the complex architecture supporting the seamless bond between forces on the ground, surveillance drones, and the chain of command is broken? These types of questions echo past worries of technologically enabled military command systems regarding issues of decision-making paralysis, of tendencies towards over-centralization, or of making forces over-reliant on technology that is destined to, at some point, break down. As scholars of war and international security have argued, the dreams of information communications technology or AI enabled military solutionism are likely overstated.
These are not characteristics any private company would want to point to in a demo of their new product. The cleanness of the demo is, therefore, understandable within that framework. However, any system that portrays war in such a simplistic, one sided, fashion must be pushed to engage with the above inquiries if it aims to take up a place in military conflicts moving forward.
Technical hurdles and the question of trust. Seamless integration of AI systems to decision-making processes is poised to face a problem of trust. In the case of AI-using autonomous weapons, studies show that military personnel are predominantly skeptical about being deployed with these systems due to safety, accuracy, and reliability concerns.
Some argue that this kind of reliability can be developed over time with education similar to other military technologies, such as flying by cockpit instruments. While it’s true that training can develop some degree of familiarity, the complexity of AI systems introduces a different dimension to the trust issue. Although advanced, cockpit instruments typically operate within defined parameters and are directly interpretable by trained pilots. Their functions are specific, transparent, and predictable. AI systems, on the other hand, employ complex algorithms and learn from training data in ways that are not transparent. Moreover, the susceptibility of artificial intelligence systems to adversarial attacks further complicates the trust issue.
Black-box models. In a recently published paper, authors affiliated with OpenAI indicate they “do not understand how they [large language models] work.” The black box problem refers to the fact that despite their capabilities, it is often challenging to understand or explain exactly how AI models arrive at specific outputs given certain inputs. This is due to the complex network of “neurons” and the immense number of parameters involved in these models. In practical terms, when a large language model generates a battle plan, it is likely extremely difficult to map out the specific processes and decisions that lead to the final outcome.
Several tools can help mitigate the problem of explainability in AI systems. Saliency maps, for instance, help pinpoint the most significant features in the input data for the model’s decision, through an analysis of gradients, activations, or perturbations. Partial dependence plots, on the other hand, show the direction and the magnitude of the relationship between a feature and the predicted outcome. Shapley values calculate the average contribution of a feature to the prediction. Such methods might help mitigate the problem of explainability, but they are often not intuitive or easily understood by non-experts, thus limiting their effectiveness in promoting transparency and trust in AI systems.
Defense Department officials understand the problem. Maynard Holliday, the Deputy Chief Technology Officer for Critical Technologies, said that commanders are not going to trust a model without understanding how, and on which data, it was trained. The problem led the Defense Advanced Research Projects Agency (DARPA) to create the explainable artificial intelligence (XAI) program in 2015 to “understand, appropriately trust, and effectively manage” AI systems. With the end of the program in 2021, the XAI agenda seems to have slowed down and the majority of the benefits of these initiatives ended up not being for the end users but for AI engineers who use explainability to debug their models. One expected but meaningful finding from the program was that advisability—the ability of an AI system to receive corrections from users—increased the level of trust more than explainability.
Training-stage problems. Large language models face risks associated with their training data as well. Adversarial attacks during the training phase involve meddling with the dataset, altering input features, or manipulating data labels. Such attempts at poisoning involve the introduction of malicious or incorrect data into a model’s training dataset, with the intent of corrupting the model’s learning and subsequent behavior. While some data-poisoning attacks might simply degrade the performance of AI systems, resulting in general inaccuracies and inefficiencies, more sophisticated attacks could be designed to elicit specific reactions from the system.
Researchers have demonstrated that it is feasible to inject “digital poisons” into web content such as Wikipedia, which are often used for creating training datasets. Hence, the military is intent on training their models with exclusively Department of Defense data. While this is certainly a step in the right direction, it does not completely rule out risks related to non-Department of Defense data, which are required to reach the degree of utility and versatility of models like ChatGPT. A recent Army request for information on protecting their datasets indicates that the search for an answer continues. The Army’s request seeks solutions to challenges at the training stage including data encryption, security auditing and data integrity as well as ways of remediations that should be employed if a dataset gets compromised.
Deployment-stage problems. After the deployment of the model, problems will persist. Even the most advanced technical systems —particularly large language model-enabled technology, which is known to act unexpectedly when presented with situations not included in training datasets—should not be considered immune from post-deployment issues. More worryingly, studies show that AI models can be susceptible to adversarial attacks even when the attacker only has query access to the model. A well-known category of attacks called physical adversarial attacks, adversarial actions against AI models that are executed in the real-world as opposed to the digital domain can cause the AI to misinterpret or misclassify what it is sensing. Studies highlight that even small-magnitude perturbations added to the input may cause significant deceptions. For instance, just with the placement of stickers on the road, researchers could fool Tesla’s autopilot to drive into oncoming traffic.
Deception has historically been a core part of war, giving advantages to militaries that can mislead enemy forces into either delayed action or outright surprise. Military AI systems have proven subject to falling for relatively simple, if not creative, tricks. In one well-known case, during a testing scenario, a sentry system was unable to recognize approaching United States Marines who had simply covered their face with pieces of tree bark. It would be imprudent to expect adversary forces to not try similar tactics, particularly if they are aware of how brittle many AI systems can be. Moreover, AI-enabled systems can display problematic levels of overconfidence in their performance. For example, in 2021, an Air Force targeting algorithm trained on what is known as “synthetic data,” or computer generated data used to build out datasets that might be otherwise hard to collect, though it was successfully recognizing objects at an accuracy rate of 90 percent. The true number, however, was closer to 25 percent.
Utopian war? Historian Duncan Bell suggests that “utopias are engines of world-making, a nowhere that signals the possible future instantiation of a somewhere,” an “elaboration of a hypothetical resolution.” In some accounts of utopia, scientific and technological progress are envisioned as the path towards final realization. In many ways, AI enabled systems such as Palantir’s Artificial Intelligence Platform construct a vision of utopian war, identifying a future in which advanced technology makes the processes of military decision-making akin to bouncing a few requests for intelligence or courses of action off an AI-enabled chat system. It envisions complete knowledge of the enemy, the capacity for friendly forces to act unburdened by opposition, and the ability to rapidly generate a list of reliable plans of attack in only seconds. Thus, such platforms present a “resolution” to some of the core complications of military command—or as stated in the US Marine Corps command doctrine, the “twin problems of uncertainty and time”—at least for the lucky ones who possess the technology. But as with most utopian visions, potential problems with this projected image of technological proficiency loiter in the background, and they should warn us against accepting such representations at face value.
Even if the aforementioned concerns regarding system disruption and deception are resolved, platforms such as Palantir’s offer us a vision of war where violence and politics are masked behind a sophisticated, highly aestheticized technological display. As a result, war is presented as digital blocks that are knowable and manageable through the help of an AI-enabled system. As scholar Anders Endberg-Pederson puts it in his work on the links between aesthetic design and warfare, systems akin to Artificial Intelligence Platform reflect a form of “selective anaesthesia, a resilient numbness to the brute realities of warfare.”
Rather than making war clearer and cleaner, international relations scholars have noted that such systems are just as likely to make it messier. Historically, advanced computationally enabled weapon systems—including AEGIS and Patriotmissiles—are known to have targeted and fired upon unintended targets. In more current contexts, researchers Avi Goldfarb and Jon Lindsay have argued that AI-enabled systems designed to slice through the fog of war could also cause more confusion for decision makers. These are the sorts of expectations that should be at the forefront of how analysts, policymakers, and the general public approach the intersection of AI and war.
Importantly, our mental models for how artificial intelligence intersects with war are not trivial considerations to worry about at some point into the future. What appears likely, even despite ongoing well-meaning global efforts to keep lethal autonomous weapon systems away from battlefields, is that AI is set to be further integrated in the domain of war. For instance, Palantir’s Alex Karp recently stated that the company’s software is being used in Ukrainian targeting processes (although it is unclear how similar that software might be to the Artificial Intelligence Platform demo). In July of this year, Karp also authored an OpEd in The New York Times framing the development of military AI as “our Oppenheimer moment” and advocating for the pursuit of AI-enabled military systems in the face of “adversaries who will not pause to indulge in the theatrical debates about the merits of developing technologies with critical military and national security applications.” Moreover, systems with autonomous capabilities are reportedly being deployed on the front lines of the conflict by both Ukrainian and Russian forces, particularly in the form of drones and loitering munitions. As AI becomes further linked with life and death decisions on the battlefield, it’s important to hesitate before accepting the hollow view of AI-enabled conflict.
The Bulletin elevates expert voices above the noise. But as an independent nonprofit organization, our operations depend on the support of readers like you. Help us continue to deliver quality journalism that holds leaders accountable. Your support of our work at any level is important. In return, we promise our coverage will be understandable, influential, vigilant, solution-oriented, and fair-minded. Together we can make a difference.