Spotting AI-generated content is too hard. Look for credible sources instead.

By Nate Sharadin | November 2, 2023

Illustration of robot holding ripped picture of a person's face Illustration by Erik English, edited under license via studiostoks / Adobe

If it follows a pattern set in the last two US presidential elections, the 2024 presidential race could be decided by less than 100,000 votes cast in fewer than four states. It seems possible that fake but credible-sounding audio of a presidential candidate using a racial slur released just prior to the election, without any time for adequate media or official response, could affect turnout in exactly the counties needed to swing the election.

Deep generative models, such as those that power OpenAI’s ChatGPT and Dall-E 3, Google’s Palm, and Meta’s (open-source) LlaMa 2, are capable of extraordinary things. These models can generate audio, visual, and text-based content that is indistinguishable from content created by humans. These capabilities are exciting. The prospect of cheap, easy-to-access image, audio, and text generation promises to revolutionize the way we work, play, and socialize.

But these new capabilities are alarming, too. Despite what some companies claim, it’s not currently possible to reliably detect whether a piece of text-based content was generated using a machine learning model. And the “watermarking” of model-generated audio-, image-, and video- based content is vulnerable to easy exploitation and evasion.

So, there’s growing interest in enabling human beings to spot the bot—that is, to distinguish model-generated content, also known as synthetic media, from human-generated content across a variety of modalities, including images and audio. But even that is proving ineffective: As generative AI continues to improve it’s becoming increasingly difficult for humans to spot model-generated content. The focus, thus, should be on educating the public on the capabilities of current AI models and on trusting only content that comes from verified sources, such as credible news organizations.

The problem with watermarking. To be as clear as possible, there are not right now and never will be in the future an automated, reliable technique for deciding whether a piece of text was written by a state-of-the-art generative model or by a human. If you reflect on the nature of text for just a moment, it should be clear why this is so. Unlike digital images and audio, there is simply nowhere to locate a watermark in text-based content, and the evidence regarding the indistinguishability of human- from model- generated text is already in: People (including those who read for a living) can’t tell the difference. Anyone who claims otherwise is selling something (or attempting to avoid a specific kind of regulation).

In fact, after announcing plans for a detector of model-generated text in January, OpenAI quietly pulled it in July, writing in an update to their webpage that “the [detector] is no longer available due to its low rate of accuracy. We are working to incorporate feedback and are currently researching more effective provenance techniques for text.”

The internet has been flooded with text-based misinformation and disinformation since long before widespread access to large language models. So the inability to distinguish AI-generated text from human-made text may be bad, but not truly catastrophic.

The prospects for serious harm are more worrying with audio- and image- based content than with text. Model-generated visual content isn’t limited to viral pictures of ex-presidents being arrested or swagged out popes. It’s possible to generate photorealistic images of soldiers participating in war crimes (leading to riots), or of corporate executives meeting with competitors for secret backroom deals (possibly altering the market), or of politicians taking bribes, or shaking hands with terrorists (thereby changing election outcomes).

Memo to Trump: Create sensible AI policies that focus on real—not speculative—concerns

On Monday, President Biden issued an executive order outlining recommendations that, among other things, included calls for watermarking content created by artificial intelligence models. For one, federal agencies will be required to provide watermarks or labels when using generative AI in their communications. This might seem like a good sign. But content authentication of this sort, within the very small ecosystem of, for example, notices from the IRS or other federal agencies, doesn’t even begin to address the public’s current problem. The problem that is presently being exacerbated by highly capable generative models is the creation of arbitrary quantities of inflammatory and harmful content.

This week’s executive order builds on a set of voluntary agreements made in July between the Biden White House and large AI developers like OpenAI, Google and Meta, who have promised to “develop tools or APIs [application programming interfaces] to determine if a particular piece of content was created with their system.”

Perhaps voluntary agreements of this kind could help—especially if they were hardened into regulation; intuitively, at least, tools for detecting model-generated content hold more promise of mitigating the relevant risks. But the promised tools are red herrings. They do not reduce the risk of being duped by model-generated image- and audio- based content. Pay very close attention to what model developers are promising: tools (or application programming interfaces) that determine whether a piece of content was generated using their system. They are promising, in effect, potentially quite cryptographically sophisticated watermarks for image- and audio- based content. But there are two problems with this approach.

The first, and most serious problem is that watermarking doesn’t in fact work to reduce the risk of mistaking model- for human- generated content. Researchers have demonstrated over and over again that every single watermarking technique is vulnerable to exploitation and evasion. And in any case, all these techniques rely on watermarking the image or audio files themselves. Take a picture of an image with your phone, and the watermark is gone. Record an audio snippet of an audio file with your phone, and again the watermark is gone.

Perhaps, governments could adopt regulations that require people disseminating or using content created by generative models to clearly identify the content as such.

Tech companies have already moved toward adopting such a regime. And if backed by legislation, a system like this could in effect make watermarking an industry standard and even result in a world where the removal of watermarks is illegal. But even if it is an industry standard to include watermarks on model-generated content, and in fact illegal to remove such watermarks, if it is also impossible to discriminate between model- and human- generated content that does not carry an accompanying watermark, then such regulation is effectively toothless.

Consider the following analogy: It is illegal to use homemade currency to buy goods and services, and all homemade currency must be clearly labeled as such. But assume that it is possible to easily remove such labels, and that it is (as a result) impossible to tell whether a particular unit of currency is homemade or issued by the government. People wouldn’t be expected to trust that any particular bill in circulation was issued by the government. The same goes, here, for model- and human- generated content.

AI can accelerate scientific advance, but the real bottlenecks to progress are cultural and institutional

The second problem with calls for watermarking involves the open-source nature of many very capable generative models. In a world where there are open-source models capable of generating content indistinguishable from human-generated content, the only thing a person can learn from a piece of content that does not have a watermark is that it was not created using a particular developer’s closed-source model that does produce watermarked content. But this does not tell the person what he or she wants or needs to know: Was this piece of non-watermarked content generated by a model or by a human?

Public education as the start of a solution. What, then, should be done to reduce the flood of new risks from model-generated image- and audio- based content? Rather than promising tools that don’t address the underlying problem, regulators and model developers should change tack. What is required is widespread, systematic public education. Recent research—much of it motivated by the perceived effect of misinformation on the 2016 US election—suggests that media and information literacy interventions can mitigate the effects and spread of fake news on social media. The case of distinguishing model- and human- generated content (as opposed to distinguishing fake news from genuine) presents special challenges. But lessons from that research can be adapted to the present context.

As a first step, the broad public needs to understand exactly how capable present models are. Model developers should therefore publicly, visibly, and repeatedly demonstrate to the public how their models can easily be used to generate audio and visual content that’s indistinguishable from human-generated content. They should do this in controlled settings, with expert explanations of what enables the technology to work. They should also provide detailed accounts of the limits of existing detection techniques. These accounts need to acknowledge that existing techniques, such as watermarking, cannot provide assurance that a piece of content was not generated using a model. These efforts will help the public understand their own limits in distinguishing between human- and AI- generated media.

Again, education will not enable the public to “spot the bot”: indistinguishable things cannot be distinguished. Instead, public education should focus on teaching people not to trust the audio, visual, and text-based content they encounter unless it is from a known, vetted, reliable source. This is a difficult skill to acquire and exercise. It requires overcoming humans’ default disposition to trust the testimony of others. The rise of social media—and media fragmentation more generally—has made the challenge of knowing exactly when to trust what one hears, sees, or reads even more difficult. But the rise of generative models capable of creating synthetic media makes developing this epistemic skill essential for navigating a world where we can’t spot the bot.

Together, we make the world safer.

The Bulletin elevates expert voices above the noise. But as an independent nonprofit organization, our operations depend on the support of readers like you. Help us continue to deliver quality journalism that holds leaders accountable. Your support of our work at any level is important. In return, we promise our coverage will be understandable, influential, vigilant, solution-oriented, and fair-minded. Together we can make a difference.

Make your gift now

Keywords: 2024 election, AI, Disinformation, Presidential elections, artificial intelligence, executive order, generative AI, synthetic media
Topics: Artificial Intelligence, Disruptive Technologies

Get alerts about this thread

0 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Nate Sharadin

Nate Sharadin is a philosopher at the University of Hong Kong and a research fellow at the Center for AI Safety, where he works on ethical issues... Read More

Spotting AI-generated content is too hard. Look for credible sources instead.

Together, we make the world safer.

Nate Sharadin

India’s missile attack shows that managing an India-Pakistan crisis is easier said than done

By Syed Ali Zia Jaffery

Zaporizhzhia: Hurdle or catalyst for a peace deal in Ukraine?

By Henry Sokolski

100 days of Trump: A view from Europe

By Claude Malhuret

How the dismantling of NOAA threatens the Keeling Curve

By Eric Morgan, Ralph Keeling

Countering foreign influence on elections and democracy requires more—not less—research

By Kevin T. Greene

Will America be “flying blind” on bird flu? A key wastewater-tracking program may soon end

By Matt Field

RELATED POSTS

Countering foreign influence on elections and democracy requires more—not less—research

By Kevin T. Greene

Will America be “flying blind” on bird flu? A key wastewater-tracking program may soon end

By Matt Field

AI can accelerate scientific advance, but the real bottlenecks to progress are cultural and institutional

By Abi Olvera

Tracking Trump’s approach to existential threats

By Thomas Gaulkin

The real ‘Great Replacement’

By Dawn Stover

As more countries enter space, the boundary between civilian and military enterprise is blurring. Dangerously.

By Zohaib Altaf

Receive Email
Updates

Recent Stories

India’s missile attack shows that managing an India-Pakistan crisis is easier said than done

Zaporizhzhia: Hurdle or catalyst for a peace deal in Ukraine?

How the dismantling of NOAA threatens the Keeling Curve

Countering foreign influence on elections and democracy requires more—not less—research

Will America be “flying blind” on bird flu? A key wastewater-tracking program may soon end

A new Iran nuclear deal—without sunset

Video: How many nuclear weapons does China have in 2025?

Trump wants denuclearization and a ‘Golden Dome.’ He can’t have both

Don't miss an update

Spotting AI-generated content is too hard. Look for credible sources instead.

Together, we make the world safer.

By Syed Ali Zia Jaffery

By Henry Sokolski

By Claude Malhuret

By Eric Morgan, Ralph Keeling

By Kevin T. Greene

By Matt Field

RELATED POSTS

By Kevin T. Greene

By Matt Field

By Abi Olvera

By Thomas Gaulkin

By Dawn Stover

By Zohaib Altaf

Receive Email Updates

Recent Stories

Don't miss an update

Receive Email
Updates