Can We Mathematically Spot the Possible Manipulation of Results in Research Manuscripts Using Benford’s Law?

The reproducibility of academic research has long been a persistent issue, contradicting one of the fundamental principles of science. Recently, there has been an increasing number of false claims found in academic manuscripts, casting doubt on the validity of reported results. In this paper, we uti...

Full description

Bibliographic Details
Main Authors: Teddy Lazebnik, Dan Gorlitsky
Format: Article
Language:English
Published: MDPI AG 2023-10-01
Series:Data
Subjects:
Online Access:https://www.mdpi.com/2306-5729/8/11/165
_version_ 1797459621739233280
author Teddy Lazebnik
Dan Gorlitsky
author_facet Teddy Lazebnik
Dan Gorlitsky
author_sort Teddy Lazebnik
collection DOAJ
description The reproducibility of academic research has long been a persistent issue, contradicting one of the fundamental principles of science. Recently, there has been an increasing number of false claims found in academic manuscripts, casting doubt on the validity of reported results. In this paper, we utilize an adapted version of Benford’s law, a statistical phenomenon that describes the distribution of leading digits in naturally occurring datasets, to identify the potential manipulation of results in research manuscripts, solely using the aggregated data presented in those manuscripts rather than the commonly unavailable raw datasets. Our methodology applies the principles of Benford’s law to commonly employed analyses in academic manuscripts, thus reducing the need for the raw data itself. To validate our approach, we employed 100 open-source datasets and successfully predicted <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>79</mn><mo>%</mo></mrow></semantics></math></inline-formula> of them accurately using our rules. Moreover, we tested the proposed method on known retracted manuscripts, showing that around half (48.6%) can be detected using the proposed method. Additionally, we analyzed 100 manuscripts published in the last two years across ten prominent economic journals, with 10 manuscripts randomly sampled from each journal. Our analysis predicted a <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>3</mn><mo>%</mo></mrow></semantics></math></inline-formula> occurrence of results manipulation with a <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>96</mn><mo>%</mo></mrow></semantics></math></inline-formula> confidence level. Our findings show that Benford’s law adapted for aggregated data, can be an initial tool for identifying data manipulation; however, it is not a silver bullet, requiring further investigation for each flagged manuscript due to the relatively low prediction accuracy.
first_indexed 2024-03-09T16:54:57Z
format Article
id doaj.art-a4056e8c07984852bba6a241eba186e4
institution Directory Open Access Journal
issn 2306-5729
language English
last_indexed 2024-03-09T16:54:57Z
publishDate 2023-10-01
publisher MDPI AG
record_format Article
series Data
spelling doaj.art-a4056e8c07984852bba6a241eba186e42023-11-24T14:37:18ZengMDPI AGData2306-57292023-10-0181116510.3390/data8110165Can We Mathematically Spot the Possible Manipulation of Results in Research Manuscripts Using Benford’s Law?Teddy Lazebnik0Dan Gorlitsky1Department of Cancer Biology, Cancer Institute, University College London, London WC1E 6BT, UKDepartment of Economics, Reichman University, Herzliya 4610101, IsraelThe reproducibility of academic research has long been a persistent issue, contradicting one of the fundamental principles of science. Recently, there has been an increasing number of false claims found in academic manuscripts, casting doubt on the validity of reported results. In this paper, we utilize an adapted version of Benford’s law, a statistical phenomenon that describes the distribution of leading digits in naturally occurring datasets, to identify the potential manipulation of results in research manuscripts, solely using the aggregated data presented in those manuscripts rather than the commonly unavailable raw datasets. Our methodology applies the principles of Benford’s law to commonly employed analyses in academic manuscripts, thus reducing the need for the raw data itself. To validate our approach, we employed 100 open-source datasets and successfully predicted <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>79</mn><mo>%</mo></mrow></semantics></math></inline-formula> of them accurately using our rules. Moreover, we tested the proposed method on known retracted manuscripts, showing that around half (48.6%) can be detected using the proposed method. Additionally, we analyzed 100 manuscripts published in the last two years across ten prominent economic journals, with 10 manuscripts randomly sampled from each journal. Our analysis predicted a <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>3</mn><mo>%</mo></mrow></semantics></math></inline-formula> occurrence of results manipulation with a <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>96</mn><mo>%</mo></mrow></semantics></math></inline-formula> confidence level. Our findings show that Benford’s law adapted for aggregated data, can be an initial tool for identifying data manipulation; however, it is not a silver bullet, requiring further investigation for each flagged manuscript due to the relatively low prediction accuracy.https://www.mdpi.com/2306-5729/8/11/165statistical analysisanomaly detectionfirst digit lawresults reproduction
spellingShingle Teddy Lazebnik
Dan Gorlitsky
Can We Mathematically Spot the Possible Manipulation of Results in Research Manuscripts Using Benford’s Law?
Data
statistical analysis
anomaly detection
first digit law
results reproduction
title Can We Mathematically Spot the Possible Manipulation of Results in Research Manuscripts Using Benford’s Law?
title_full Can We Mathematically Spot the Possible Manipulation of Results in Research Manuscripts Using Benford’s Law?
title_fullStr Can We Mathematically Spot the Possible Manipulation of Results in Research Manuscripts Using Benford’s Law?
title_full_unstemmed Can We Mathematically Spot the Possible Manipulation of Results in Research Manuscripts Using Benford’s Law?
title_short Can We Mathematically Spot the Possible Manipulation of Results in Research Manuscripts Using Benford’s Law?
title_sort can we mathematically spot the possible manipulation of results in research manuscripts using benford s law
topic statistical analysis
anomaly detection
first digit law
results reproduction
url https://www.mdpi.com/2306-5729/8/11/165
work_keys_str_mv AT teddylazebnik canwemathematicallyspotthepossiblemanipulationofresultsinresearchmanuscriptsusingbenfordslaw
AT dangorlitsky canwemathematicallyspotthepossiblemanipulationofresultsinresearchmanuscriptsusingbenfordslaw