Imagine the near future of artificial intelligence (AI) models. One model might be able to replicate the email-writing styles of others. Mimicking a virologist, it could convince his colleagues to provide sensitive information on a newly identified viral mutation. Another might evaluate existing chemotherapies and suggest variations for making the drugs even more toxic, including to non-cancerous cells. Yet another model might be able analyze a failed genetic engineering experiment and could, through generated photos, guide a beginner toward an improved protocol for making antibiotic-resistant bacteria.
The capabilities these scenarios describe are close to what the current crop of advanced and publicly available AI tools can already do. The day when a bad actor could use such an AI model to develop, say, a biological weapon, could potentially be soon. But how can these risky uses of AI be deterred without curtailing the extraordinary potential of these tools to accelerate helpful biotechnology development?
President Joe Biden unveiled his Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence last week to coincide with an international AI safety summit in the United Kingdom. It’s a 111-page document calling for studies, guidelines, and precautions that aim to get ahead of the potential dangers of a very new and rapidly improving technology. Among several potentially risky applications of AI, the order singles out biosecurity as a particular area of focus for guidance. It calls for the development of new evaluative tools and testbeds (software interfaces that can prompt or even retrain a model in a variety of ways that probe its potential weaknesses) so that developers can conduct red-teaming tests that assess how a system might be misused in order to deploy measures to curb identified risks. The scope of those risks is unclear, at this point, even for practitioners.
The biosecurity concerns of new AI systems largely stem from two categories of AI model. One is the large language model that can predict the next word in a sentence to produce compelling language (think ChatGPT), and the other being so-called “biodesign” tools. Large language models use a conversational natural-language interface to predict and generate text or images based on the large datasets of content used to train the model; having read hundreds of marketing emails, a large language model could easily generate marketing copy for a new, made-up product, for example. Biodesign tools, such as AlphaFold, have been trained on DNA sequences or protein structures and use AI to identify the best structure or sequence that matches what it has learned about the way proteins fold or the way DNA is constructed in nature.
Researchers can use these tools to analyze and develop proteins and other biological constructs. But the tools can also produce designs for a variety of chemical weapons, including some as-yet-unknown candidate compounds predicted to be more toxic than VX, a potent and banned nerve agent. In ad hoc red-teaming exercises, large language models have been happy to provide the steps to synthesize 1918 pandemic influenza or to suggest companies that will synthesize DNA orders without screening them for pathogens or restricted agents.
It’s true, of course, that getting hold of publicly available genetic sequence information or a list of potentially less-than-thorough DNA synthesis providers by using a large language model would still leave potential bad actors several substantial steps away from being able to recreate the influenza virus, or to take synthesized DNA and turn it into a functional biological product. The tacit knowledge learned from working in a laboratory or studying virology is difficult if not impossible to convey through the format of a chatbot. And so far, no one has claimed that AI models are providing classified information or suggesting previously-unknown pathways to bioweapons development.
The executive order has tasked government agencies with answering a different question: How likely is it that AI tools will develop these capabilities in the near future, and what would be the worst a talented interlocutor could accomplish with them in the meantime.
Alongside evaluating the risks of AI tools, the order singles out two known risks for immediate intervention. As it stands, synthetic DNA order screening is a patchwork of voluntary compliance efforts by individual companies. Someone wishing to obtain potentially harmful genetic material might be able to contact a company that doesn’t check what it’s synthesizing or for whom. The order directs the White House Office of Science and Technology Policy to develop a screening framework for synthetic DNA requests and then restricts federally funded DNA purchasers to only using companies that abide by that framework.
In another move, the order also highlights the potential risks in allowing AI models to be trained on large biological datasets, including genetic sequence information or protein structures. It calls for consideration of whether access to these repositories should be restricted. One such dataset that AI models for biological design have been trained on is a US government operated repository of sequence information called NCBI GenBank. By suggesting that GenBank could be made be off limits to AI models specifically, rather than restricting all access to the data, the order is implicitly assuming that the capabilities of AI models trained on and able to access large amounts of biological data will exceed those of researchers using the repositories without AI assistance. Presumably, a suitably advanced model could learn from sequence data how to engineer a new sequence with desired functionalities, including pathogenicity or toxicity.
Export controls are quite properly used to police the international flow of products and technologies with national security implications—think nuclear weapons related components, for example. But the federal government has overstepped in regulating emerging technologies before. Before 1996, export controls on encryption and cryptographic software, the programs used to protect computer systems from intrusions, treated any printing of a particular algorithm for encryption as akin to exporting a munition. And the government could go too far again with efforts to regulate AI. Restrictions on AI will, first, require a clear-cut definition of “artificial intelligence.” The order does this by defining the amount of computing power used to train a model and the number of operations the model is capable of performing. To some, this looks like a ban on doing too much math without governmental approval, but the step toward a practical definition of AI will help define the set of models to which the order applies, and which could prove to be the riskiest and the most in need of oversight.
Definitions of AI systems that allow for comparisons of various AI tools will be particularly useful to biosecurity researchers. The distinction between the capabilities of a large language model, like GPT-4, and AI-enabled biodesign tools, like AlphaFold, which predicts protein structures, is often overlooked and will need to be clarified through the work outlined in the order, which does not distinguish between these types of AI model.
Within the cohort of large language models, performance varies, depending on the test being used or the framing of the prompt given to the model. Among AI biodesign tools, models with specialized capabilities have different types of risk. A tool trained on antibody-binding data may do a good job designing a protein that evades antibodies (think of the way mutations to the COVID spike protein changed its ability to be detected by the immune system) but a poor job when asked to design a toxin that makes holes in cellular membranes. The challenge for the agencies tasked with developing evaluation tools and testbeds will be to define tests that capture the range of tasks that different AI tools perform in a standardized way to allow for appropriate assessments of their risks.
Recently, colleagues at Los Alamos National Laboratories and I have begun outlining the design of a testbed and risk assessment protocol for AI-enabled biodesign tools. The new executive order directs the Department of Energy to develop such model evaluation tools, testbeds, and guardrails. Just five months separated the release of two powerful AI models, GPT-3.5 and GPT-4, both products of ChatGPT maker OpenAI. The order gives agencies about nine months to complete their AI policy recommendations. By then there could be newer and more powerful large language models and biodesign tools. Managing those will require assessment and risk mitigation tools with the flexibility to apply to AI models with as-yet-unimagined capabilities.
The Bulletin elevates expert voices above the noise. But as an independent nonprofit organization, our operations depend on the support of readers like you. Help us continue to deliver quality journalism that holds leaders accountable. Your support of our work at any level is important. In return, we promise our coverage will be understandable, influential, vigilant, solution-oriented, and fair-minded. Together we can make a difference.