“Bosom peril” is not “breast cancer”: How weird computer-generated phrases help researchers find scientific publishing fraud

By Guillaume Cabanac, Cyril Labbé, Alexander Magazinov | January 13, 2022

In 2020, despite the COVID pandemic, scientists authored 6 million peer-reviewed publications, a 10 percent increase compared to 2019. At first glance this big number seems like a good thing, a positive indicator of science advancing and knowledge spreading. Among these millions of papers, however, are thousands of fabricated articles, many from academics who feel compelled by a publish-or-perish mentality to produce, even if it means cheating.

But in a new twist to the age-old problem of academic fraud, modern plagiarists are making use of software and perhaps even emerging AI technologies to draft articles—and they’re getting away with it.

The growth in research publication combined with the availability of new digital technologies suggest computer-mediated fraud in scientific publication is only likely to get worse. Fraud like this not only affects the researchers and publications involved, but it can complicate scientific collaboration and slow down the pace of research. Perhaps the most dangerous outcome is that fraud erodes the public’s trust in scientific research. Finding these cases is therefore a critical task for the scientific community.

We have been able to spot fraudulent research thanks in large part to one key tell that an article has been artificially manipulated: The nonsensical “tortured phrases” that fraudsters use in place of standard terms to avoid anti-plagiarism software. Our computer system, which we named the Problematic Paper Screener, searches through published science and seeks out tortured phrases in order to find suspect work. While this method works, as AI technology improves, spotting these fakes will likely become harder, raising the risk that more fake science makes it into journals.

What are tortured phrases? A tortured phrase is an established scientific concept paraphrased into a nonsensical sequence of words. “Artificial intelligence” becomes “counterfeit consciousness.” “Mean square error” becomes “mean square blunder.” “Signal to noise” becomes “flag to clamor.” “Breast cancer” becomes “Bosom peril.” Teachers may have noticed some of these phrases in students’ attempts to get good grades by using paraphrasing tools to evade plagiarism.

As of January 2022, we’ve found tortured phrases in 3,191 peer-reviewed articles published (and counting), including in reputable flagship publications. The two most frequent countries listed in the authors’ affiliations are India (71.2 percent) and China (6.3 percent). In one specific journal that had a high prevalence of tortured phrases, we also noticed the time between when an article was submitted and when it was accepted for publication declined from an average of 148 days in early 2020 to 42 days in early 2021. Many of these articles had authors affiliated with institutions in India and China, where the pressure to publish may be exceedingly high.

In China, for example, institutions have been documented to impose production targets that are nearly impossible to meet. Doctors affiliated with Chinese hospitals, for instance, have to get published to get promoted, but many are too busy in the hospital to do so.

Tortured phrases also star in “lazy surveys” of the literature: Someone copies abstracts from papers, paraphrases them, and pastes them in a document to form gibberish devoid of any meaning.

Our best guess for the source of tortured phrases is that authors are using automated paraphrasing tools—dozens can be easily found online. Crooked scientists are using these tools to copy text from various genuine sources, paraphrase them, and paste the “tortured” result into their own papers. How do we know this? A strong piece of evidence is that one can reproduce most tortured phrases by feeding established terms into paraphrasing software.

Memo to Trump: Develop specific AI guidelines for nuclear command and control

Using paraphrasing software can introduce factual errors. Replacing a word by its synonym in lay language may lead to a different scientific meaning. For example, in engineering literature, when “accuracy” replaces “precision” (or vice versa) different notions are mixed-up; the text is not only paraphrased but becomes wrong.

We also found published papers that appear to have been partly generated with AI language models like GPT-2, a system developed by OpenAI. Unlike papers where authors seem to have used paraphrasing software, which changes existing text, these AI models can produce text out of whole cloth.

While computer programs that can create science or math articles have been around for almost two decades (like SCIgen, a program developed by MIT graduate students in 2005 to create science papers, or Mathgen, which has been producing math papers since 2012), the newer AI language models present a thornier problem. Unlike the pure nonsense produced by Mathgen or SCIgen, the output of the AI systems is much harder to detect. For example, given the beginning of a sentence as a starting point, a model like GPT-2 can complete the sentence and even generate entire paragraphs. Some papers appear to be produced by these systems. We screened a sample of about 140,000 abstracts of papers published by Elsevier, an academic publisher, in 2021 with OpenAI’s GPT-2 detector. Hundreds of suspect papers featuring synthetic text appeared in dozens of reputable journals.

AI could compound an existing problem in academic publishing—the paper mills that churn out articles for a price—by making paper mill fakes easier to produce and harder to suss out.

How we found tortured phrases. We spotted our first tortured phrase last spring while reviewing various papers for suspicious abnormalities, like evidence of citation gaming or references to predatory journals. Ever heard of “profound neural organization?” Computer scientists may recognize this as a distorted reference to a “deep neural network.” This led us to search for this phrase in the entire scientific literature where we found several other articles with the same bizarre language, some of which contained other tortured phrases, as well. Finding more and more articles with more and more tortured phrases (473 such phrases as of January 2022) we realized that the problem is big enough to be called out in public.

To track papers with tortured phrases, as well as meaningless papers produced by SCIgen or Mathgen (which have also made it into publications), we developed the Problematic Paper Screener. Behind the curtains, the software relies on open science tools to search for tortured phrases in scientific papers and to check whether others had already flagged issues. Finding problematic papers with tortured phrases has become a crowd effort, as researchers have used our software to find new phrases.

The best of the Bulletin’s bimonthly magazine, 2024

The problem of tortured phrases. Scientific editors and referees certainly reject buggy submissions with tortured phrases, but a fraction still evades their vigilance and gets published. This means, researchers could waste time filtering through published scams. Another problem is that interdisciplinary research could get bogged down by unreliable research, say, for example, if a public health expert wanted to collaborate with a computer scientist who published about a diagnostic tool in a fraudulent paper.

And as computers do more aggregating work, faulty articles could also jeopardize future AI-based research tools. For example, in 2019, the publisher Springer Nature used AI to analyze 1,086 publications and generate a handbook on lithium-ion batteries. The AI created “coherent chapters and sections” and “succinct summaries of the articles.” What if the source material for these sorts of projects were to include nonsensical, tortured publications?

The presence of this junk pseudo-scientific literature also undermines citizens’ trust in scientists and science, especially when it gets dragged into public policy debates.

Recently tortured phrases have even turned up in scientific literature on the COVID-19 pandemic. One paper published in July 2020, since retracted, was cited 52 times as of this month, despite mentioning the phrase “extreme intense respiratory syndrome (SARS),” which is clearly a reference to severe acute respiratory syndrome, the disease caused by the coronavirus SARS-CoV-1. Other papers contained the same tortured phrase.

Once fraudulent papers are found, getting them retracted is no easy task.

Editors and publishers who are members of the Committee on Publication Ethics must follow pre-established complex guidelines when they find problematic papers. But the process has a loophole. Publishers “investigate the issue” for months or years because they are supposed to wait for answers and explanations from authors for an undefined amount of time.

AI will help detect meaningless papers, erroneous ones, or those featuring tortured phrases. But this will be effective only in the short to medium term. AI checking tools could end up provoking an arms race in the longer term, when text-generating tools are pitted against those that detect artificial texts, potentially leading to ever-more-convincing fakes.

But there are few steps academia can take to address the problem of fraudulent papers.

Apart from a sense of achievement, there is no clear incentive for a reviewer to deliver a thoughtful critique of a submitted paper and no direct detrimental effect of peer-review performed carelessly. Incentivizing stricter checks during peer-review and once a paper is published will alleviate the problem. Promoting post-publication peer-review at PubPeer.com, where researchers can critique articles in an unofficial context, and encouraging other ways to engage the research community more broadly could shed light on suspicious science.

In our view the emergence of tortured phrases is a direct consequence of the publish-or-perish system. Scientists and policy makers need to question the intrinsic value of racking up high article counts as the most important career metric. Other production must be rewarded, including proper peer-reviews, data sets, preprints, and post-publication discussions. If we act now, we have a chance to pass a sustainable scientific environment onward to the future generations of researchers.

Together, we make the world safer.

The Bulletin elevates expert voices above the noise. But as an independent nonprofit organization, our operations depend on the support of readers like you. Help us continue to deliver quality journalism that holds leaders accountable. Your support of our work at any level is important. In return, we promise our coverage will be understandable, influential, vigilant, solution-oriented, and fair-minded. Together we can make a difference.

Make your gift now

Keywords: AI, GPT-2, academic publishing
Topics: Disruptive Technologies

Get alerts about this thread

0 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

“Bosom peril” is not “breast cancer”: How weird computer-generated phrases help researchers find scientific publishing fraud

Together, we make the world safer.

By Herbert Lin

By Rita Guenther

By Kirsten Angeles, Jose Garza-Martinez, Yorgo El Moubayed, Shrestha Rath, Jon Arizti Sanz

By François Diaz-Maurin

By Raymond Jeanloz

By John P. Holdren

RELATED POSTS

By Kirsten Angeles, Jose Garza-Martinez, Yorgo El Moubayed, Shrestha Rath, Jon Arizti Sanz

By Mike Murphy

By Adrian Shahbaz

By Matt Field

By Lynn C. Klotz

By Melissa Finucane

Receive Email Updates

Recent Stories

Don't miss an update

Receive Email
Updates