Spotting AI-generated content is too hard. Look for credible sources instead.

By Nate Sharadin | November 2, 2023

Illustration of robot holding ripped picture of a person's faceIllustration by Erik English, edited under license via studiostoks / Adobe

If it follows a pattern set in the last two US presidential elections, the 2024 presidential race could be decided by less than 100,000 votes cast in fewer than four states. It seems possible that fake but credible-sounding audio of a presidential candidate using a racial slur released just prior to the election, without any time for adequate media or official response, could affect turnout in exactly the counties needed to swing the election.

Deep generative models, such as those that power OpenAI’s ChatGPT and Dall-E 3, Google’s Palm, and Meta’s (open-source) LlaMa 2, are capable of extraordinary things. These models can generate audio, visual, and text-based content that is indistinguishable from content created by humans. These capabilities are exciting. The prospect of cheap, easy-to-access image, audio, and text generation promises to revolutionize the way we work, play, and socialize.

But these new capabilities are alarming, too. Despite what some companies claim, it’s not currently possible to reliably detect whether a piece of text-based content was generated using a machine learning model. And the “watermarking” of model-generated audio-, image-, and video- based content is vulnerable to easy exploitation and evasion.

So, there’s growing interest in enabling human beings to spot the bot—that is, to distinguish model-generated content, also known as synthetic media, from human-generated content across a variety of modalities, including images and audio. But even that is proving ineffective: As generative AI continues to improve it’s becoming increasingly difficult for humans to spot model-generated content. The focus, thus, should be on educating the public on the capabilities of current AI models and on trusting only content that comes from verified sources, such as credible news organizations.

The problem with watermarking. To be as clear as possible, there are not right now and never will be in the future an automated, reliable technique for deciding whether a piece of text was written by a state-of-the-art generative model or by a human. If you reflect on the nature of text for just a moment, it should be clear why this is so. Unlike digital images and audio, there is simply nowhere to locate a watermark in text-based content, and the evidence regarding the indistinguishability of human- from model- generated text is already in: People (including those who read for a living) can’t tell the difference. Anyone who claims otherwise is selling something (or attempting to avoid a specific kind of regulation).

In fact, after announcing plans for a detector of model-generated text in January, OpenAI quietly pulled it in July, writing in an update to their webpage that “the [detector] is no longer available due to its low rate of accuracy. We are working to incorporate feedback and are currently researching more effective provenance techniques for text.”

The internet has been flooded with text-based misinformation and disinformation since long before widespread access to large language models. So the inability to distinguish AI-generated text from human-made text may be bad, but not truly catastrophic.

The prospects for serious harm are more worrying with audio- and image- based content than with text. Model-generated visual content isn’t limited to viral pictures of ex-presidents being arrested or swagged out popes. It’s possible to generate photorealistic images of soldiers participating in war crimes (leading to riots), or of corporate executives meeting with competitors for secret backroom deals (possibly altering the market), or of politicians taking bribes, or shaking hands with terrorists (thereby changing election outcomes).

Why a misleading "red team" study of the gene synthesis industry wrongly casts doubt on industry safety

On Monday, President Biden issued an executive order outlining recommendations that, among other things, included calls for watermarking content created by artificial intelligence models. For one, federal agencies will be required to provide watermarks or labels when using generative AI in their communications. This might seem like a good sign. But content authentication of this sort, within the very small ecosystem of, for example, notices from the IRS or other federal agencies, doesn’t even begin to address the public’s current problem. The problem that is presently being exacerbated by highly capable generative models is the creation of arbitrary quantities of inflammatory and harmful content.

This week’s executive order builds on a set of voluntary agreements made in July between the Biden White House and large AI developers like OpenAI, Google and Meta, who have promised to “develop tools or APIs [application programming interfaces] to determine if a particular piece of content was created with their system.”

Perhaps voluntary agreements of this kind could help—especially if they were hardened into regulation; intuitively, at least, tools for detecting model-generated content hold more promise of mitigating the relevant risks. But the promised tools are red herrings. They do not reduce the risk of being duped by model-generated image- and audio- based content. Pay very close attention to what model developers are promising: tools (or application programming interfaces) that determine whether a piece of content was generated using their system. They are promising, in effect, potentially quite cryptographically sophisticated watermarks for image- and audio- based content. But there are two problems with this approach.

The first, and most serious problem is that watermarking doesn’t in fact work to reduce the risk of mistaking model- for human- generated content. Researchers have demonstrated over and over again that every single watermarking technique is vulnerable to exploitation and evasion. And in any case, all these techniques rely on watermarking the image or audio files themselves. Take a picture of an image with your phone, and the watermark is gone. Record an audio snippet of an audio file with your phone, and again the watermark is gone.

Perhaps, governments could adopt regulations that require people disseminating or using content created by generative models to clearly identify the content as such.

Tech companies have already moved toward adopting such a regime. And if backed by legislation, a system like this could in effect make watermarking an industry standard and even result in a world where the removal of watermarks is illegal. But even if it is an industry standard to include watermarks on model-generated content, and in fact illegal to remove such watermarks, if it is also impossible to discriminate between model- and human- generated content that does not carry an accompanying watermark, then such regulation is effectively toothless.

Consider the following analogy: It is illegal to use homemade currency to buy goods and services, and all homemade currency must be clearly labeled as such. But assume that it is possible to easily remove such labels, and that it is (as a result) impossible to tell whether a particular unit of currency is homemade or issued by the government. People wouldn’t be expected to trust that any particular bill in circulation was issued by the government. The same goes, here, for model- and human- generated content.

Three key misconceptions in the debate about AI and existential risk

The second problem with calls for watermarking involves the open-source nature of many very capable generative models. In a world where there are open-source models capable of generating content indistinguishable from human-generated content, the only thing a person can learn from a piece of content that does not have a watermark is that it was not created using a particular developer’s closed-source model that does produce watermarked content. But this does not tell the person what he or she wants or needs to know: Was this piece of non-watermarked content generated by a model or by a human?

Public education as the start of a solution. What, then, should be done to reduce the flood of new risks from model-generated image- and audio- based content? Rather than promising tools that don’t address the underlying problem, regulators and model developers should change tack. What is required is widespread, systematic public education. Recent research—much of it motivated by the perceived effect of misinformation on the 2016 US election—suggests that media and information literacy interventions can mitigate the effects and spread of fake news on social media. The case of distinguishing model- and human- generated content (as opposed to distinguishing fake news from genuine) presents special challenges. But lessons from that research can be adapted to the present context.

As a first step, the broad public needs to understand exactly how capable present models are. Model developers should therefore publicly, visibly, and repeatedly demonstrate to the public how their models can easily be used to generate audio and visual content that’s indistinguishable from human-generated content. They should do this in controlled settings, with expert explanations of what enables the technology to work. They should also provide detailed accounts of the limits of existing detection techniques. These accounts need to acknowledge that existing techniques, such as watermarking, cannot provide assurance that a piece of content was not generated using a model. These efforts will help the public understand their own limits in distinguishing between human- and AI- generated media.

Again, education will not enable the public to “spot the bot”: indistinguishable things cannot be distinguished. Instead, public education should focus on teaching people not to trust the audio, visual, and text-based content they encounter unless it is from a known, vetted, reliable source. This is a difficult skill to acquire and exercise. It requires overcoming humans’ default disposition to trust the testimony of others. The rise of social media—and media fragmentation more generally—has made the challenge of knowing exactly when to trust what one hears, sees, or reads even more difficult. But the rise of generative models capable of creating synthetic media makes developing this epistemic skill essential for navigating a world where we can’t spot the bot.

Together, we make the world safer.

The Bulletin elevates expert voices above the noise. But as an independent nonprofit organization, our operations depend on the support of readers like you. Help us continue to deliver quality journalism that holds leaders accountable. Your support of our work at any level is important. In return, we promise our coverage will be understandable, influential, vigilant, solution-oriented, and fair-minded. Together we can make a difference.

Get alerts about this thread
Notify of

Inline Feedbacks
View all comments