Even the best poker players have “tells” that give away when they’re bluffing. Scientists who commit fraud have similar, but more subtle, tells. Now researchers say they have cracked the writing patterns of scientists who attempt to pass along falsified data.
The findings could eventually help identify shady research before it’s published.
There is a fair amount of research dedicated to understanding the ways liars lie. Studies have shown that liars generally tend to express more negative emotion terms and use fewer first-person pronouns. Fraudulent financial reports typically display higher levels of linguistic obfuscation—phrasing that is meant to distract from or conceal the fake data—than accurate reports.
To see if similar patterns exist in scientific academia, researchers searched the archives of PubMed, a database of life sciences journals, from 1973 to 2013 for retracted papers. They identified 253, primarily from biomedical journals, that were retracted for documented fraud and compared the writing in these to unretracted papers from the same journals and publication years, and covering the same topics.
They then rated the level of fraud of each paper using a customized “obfuscation index,” which rated the degree to which the authors attempted to mask their false results. This was achieved through a summary score of causal terms, abstract language, jargon, positive emotion terms, and a standardized ease of reading score.
“We believe the underlying idea behind obfuscation is to muddle the truth,” says David Markowitz, a graduate student working with Jeff Hancock, professor of communication at Stanford University.
“Scientists faking data know that they are committing a misconduct and do not want to get caught. Therefore, one strategy to evade this may be to obscure parts of the paper. We suggest that language can be one of many variables to differentiate between fraudulent and genuine science.”
The results, published in the Journal of Language and Social Psychology, show that fraudulent retracted papers scored significantly higher on the obfuscation index than papers retracted for other reasons. For example, fraudulent papers contained approximately 1.5 percent more jargon than unretracted papers.
“Fradulent papers had about 60 more jargon-like words per paper compared to unretracted papers,” Markowitz says. “This is a non-trivial amount.”
The researchers say that scientists might commit data fraud for a variety of reasons. Previous research points to a “publish or perish” mentality that may motivate researchers to manipulate their findings or fake studies altogether.
But the change researchers found in the writing is directly related to the author’s goals of covering up lies through the manipulation of language. For instance, a fraudulent author may use fewer positive emotion terms to curb praise for the data, for fear of triggering inquiry.
In the future, a computerized system might be able to flag a submitted paper so that editors could give it a more critical review before publication, depending on the journal’s threshold for obfuscated language. But the authors warn that this approach isn’t currently feasible given the false-positive rate.
“Science fraud is of increasing concern in academia, and automatic tools for identifying fraud might be useful,” Hancock says. “But much more research is needed before considering this kind of approach.
“Obviously, there is a very high error rate that would need to be improved. But also science is based on trust, and introducing a ‘fraud detection’ tool into the publication process might undermine that trust.”