To defend against malicious AI, the United States needs to build a robust digital immune system

By Ali Nouri | August 5, 2025

Capitol Hill with padlock hologramThe United States must add a third pillar to its AI strategy: systems that actively defend against malicious use, specifically AI that can fight back. Image: VideoFlow via Adobe

Artificial intelligence is delivering breakthroughs—from life-saving drugs to more efficient industries—but as a dual-use technology, it can also be misused for destructive ends. Policymakers have responded by restricting chip exports to adversaries and urging developers to build safe AI, hoping to slow misuse or enforce better norms. But too often, these efforts treat AI only as a threat to contain, rather than a tool to help solve the very risks it creates.

To confront 21st century threats, society needs to deploy contemporary tools—namely AI itself. Export controls and ethical pledges may slow competitors or promote better behavior, but they can’t keep up with a technology that’s cheap to copy, easy to repurpose, and spreading at internet speed. To stay safe, the United States must add a third pillar to its AI strategy: systems that actively defend against malicious use, specifically AI that can fight back.

Dubbed defensive AI, models built for this purpose can monitor, detect, and respond to anomalies in real time. Trained on vast trove of normal activity as well as attack patterns—like phishing emails, credit‑card fraud, and malicious DNA designs—these models learn what “normal” system behavior looks like, so they can quickly flag deviations and take steps to contain or neutralize threats. Such AI functions like a digital immune system, spotting abnormalities in real time and responding before humans even know something is wrong.

While AI companies talk about safety, most of their investments still goes toward scaling up the capabilities of general-purpose models. Defensive AI is different: It’s not built just to generate outputs, but to recognize and stop threats. Given AI’s growing role in education, science, entertainment, media, defense, and nearly every other sector—along with the vulnerabilities it introduces—policymakers and developers must invest in defensive AI systems capable of protecting our rapidly expanding digital infrastructure.

The limits of containment. For the past two administrations, the centerpiece of US policy on AI has been denial: starving adversaries of the chips and software needed to train cutting‑edge AI models. Washington now bars exports of Nvidia’s top-tier AI chips to China and other countries of concern, restricts the lithography machines that build those chips, and requires America’s cloud giants to flag foreign customers training large models on US servers.

Yet containment has its limits. Denied top-shelf hardware, China has doubled down on building the entire stack—meaning not just chips, but the full infrastructure needed to train and run advanced AI systems, including data center architecture, networking, software, and models—domestically. Huawei’s Ascend 910C accelerators, clustered in its CloudMatrix architecture, recently outperformed Nvidia’s H800 when running DeepSeek’s 671-billion-parameter model. While individual Chinese chips still lag behind Nvidia’s H100 or GB200 in raw performance, Huawei compensates by networking hundreds of domestically designed and foreign-sourced chips, including some acquired through sanctions loopholes. And thanks to abundant domestic energy, it can afford to scale aggressively. The result isn’t parity chip for chip, but a vertically integrated system with advanced scale-out networking that delivers competitive throughput and latency, narrowing the gap between US and Chinese AI capabilities faster than many analysts predicted.

RELATED:
Special operations by Israel and Ukraine were immediate tactical successes. Their strategic impact will take more time to assess

The alignment bet. Here at home, regulators, civil society and consumer advocates, and some AI developers have pinned their hopes on corporate guardrails—voluntary or lightly mandated checks designed to keep models fair and prevent them from being weaponized for hacking or biothreats. Labs like Google, OpenAI, and Anthropic have taken the lead in developing such safeguards. Anthropic, for instance, prevents its models from assisting users seeking information related to chemical, biological, radiological, and nuclear (CBRN) weapons. As a result, a malicious actor trying to use the model to learn how to make sarin gas would be blocked from accessing that information.

These safeguards help but have serious limitations. For example, Anthropic’s newer models are harder to trick into revealing disallowed content—so-called “jailbreaks,” in which users disguise a harmful request. But that protection holds only for certain evaluation sets. In other scenarios, even the lab most closely associated with “safety first” can’t stay ahead of its own breakthroughs: During recent red-team drills, Anthropic’s powerful Opus 4 model produced detailed instructions for acquiring dangerous viruses and carried out an unassisted cyber intrusion across multiple computers. The broader problem is structural: In a race to build the most powerful models and attract users, far more money flows toward capability than toward safety.

Meanwhile, open-source models with comparable power are proliferating on the developer platform GitHub, often fine‑tuned to run on exactly the domestic chips Huawei and others are shipping by the hundred thousand. In that decentralized ecosystem there are no enforceable rules, no gating mechanisms, and no practical way to stop a determined actor from turning helpful code into a weapon.

Enter defensive AI. Sometimes called “AI-native defenses,” these systems are not bolt-on security measures—they are agents embedded within the broader agentic AI framework, designed to operate autonomously in the same environments as the systems they monitor. Defensive AI models are trained for a specific mission: to detect misuse, flag anomalies, and act quickly enough to blunt harm before it spreads.

Early versions of defensive AI already protect critical sectors. Banks rely on anomaly‑detection models to flag fraudulent transactions within milliseconds, and major email providers use similar tools to catch phishing attempts that slip past rule‑based filters. To scale those successes, policymakers should support the adoption of these strategies to high‑salience domains like cybersecurity, biosecurity, disinformation, and other sectors where AI misuse poses serious risk.

Cybersecurity. AI‑driven hacking tools are increasingly capable of identifying and exploiting cyber vulnerabilities of critical infrastructure like power grids, hospitals, and water utilities. A defensive model trained on normal grid telemetry can flag silent code execution at a substation or anomalous voltage flows before equipment is damaged. Embedded directly in industrial‑control systems, it would function around the clock as an always on, intrusion-detection nerve layer.

Biosecurity. Automated labs can now assemble DNA, run robotic cell cultures, and use machine learning to design proteins. A defensive model inside the synthesis pipeline can compare every requested sequence against a rolling database of pathogen fragments, halting production if it detects toxin genes or unusual combinations. Turning the lab’s own automation into a sentinel converts speed from a liability into a safeguard.

RELATED:
The real ‘Great Replacement’

Disinformation. During geopolitical crises, social media is flooded with recycled video, fake audio, and AI‑generated images. Models trained on verified archives and platform behavior signatures could be built to flag synthetic artifacts or coordinated bot activity before they trend. Yet current detection tools are not keeping up. AI-generated disinformation already floods product reviews, social media posts, and search results, misleading users and undermining trust. Defensive AI must be engineered specifically to detect and stop this kind of manipulation before it spreads.

A policy blueprint. Building defensive AI will not happen by itself. To develop these systems, Washington should take the following steps:

  • Invest in innovation:Channel research and development dollars—through mechanisms like DARPA, the Energy Department’s national laboratories, and an expanded AI Safety Institute—into purpose‑built defensive models. Off‑the‑shelf chatbots won’t defend the grid or screen DNA orders, but specialized systems can. DARPA has already demonstrated the potential of such tools. Its AI Cyber Challenge showed that AI-driven systems could automatically detect and patch software vulnerabilities in defense-related domains. Now, similar efforts must scale to other areas of concern, including biosecurity, cybersecurity, and disinformation.
  • Prove out scalable adoption: Defensive AI should be inherently appealing to adopt—AI-native tools that are designed to be intuitive, lightweight, and easy to integrate, even for smaller organizations with limited technical teams. The real need is validation and assurance: Organizations want to know that if they implement a defensive agent, it will reliably protect them. A public-private coalition—linking standard setting bodies, national security agencies, and industry consortia—should define clear standards, develop secure reference implementations, and rigorously test and validate these tools so they can scale confidently across sectors.
  • Support widespread deployment: To ensure defensive AI reaches beyond large enterprises, national policy must prioritize scale and accessibility. Small utilities, school systems, and local governments need deployment playbooks, validated tools, and public infrastructure support.
  • Put practitioners at the table: Scientists, engineers, and frontline operators—those who understand the systems at risk—must help shape and continuously update these tools. Policy made without their input risks sounding smart on paper but failing in the real world.
  • Secure the defenders: Defensive AI must be robust. These models will ingest sensitive operational data, so privacy‑by‑design practices and secure data pipelines are non‑negotiable. And because they will sit deep inside critical infrastructure, they must be hardened against tampering—subject to the same red‑teaming, zero‑trust architecture, and supply‑chain scrutiny demanded of any national‑security asset.

The stakes. The race between attackers and defenders will never end; it will evolve like pathogens that evade vaccines made to fight them. But the United States can still shape the playing field. By pairing containment and alignment with a third pillar—defensive AI that protects critical systems—policy makers and developers can move from hoping for safety to engineering it. Humans have done it before, developing antibiotics against bacteria and creating firewalls to deter hackers. The next logical step is obvious. If people want AI’s benefits without courting catastrophe, they must build systems optimized not for clicks or cost cutting but for protection.


Together, we make the world safer.

The Bulletin elevates expert voices above the noise. But as an independent nonprofit organization, our operations depend on the support of readers like you. Help us continue to deliver quality journalism that holds leaders accountable. Your support of our work at any level is important. In return, we promise our coverage will be understandable, influential, vigilant, solution-oriented, and fair-minded. Together we can make a difference.

Get alerts about this thread
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments