A Benchmark Dataset to Distinguish Human-Written and Machine-Generated Scientific Papers

As generative NLP can now produce content nearly indistinguishable from human writing, it is becoming difficult to identify genuine research contributions in academic writing and scientific publications. Moreover, information in machine-generated text can be factually wrong or even entirely fabricat...

Full description

Bibliographic Details
Main Authors: Mohamed Hesham Ibrahim Abdalla, Simon Malberg, Daryna Dementieva, Edoardo Mosca, Georg Groh
Format: Article
Language:English
Published: MDPI AG 2023-09-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/14/10/522
_version_ 1797573598391566336
author Mohamed Hesham Ibrahim Abdalla
Simon Malberg
Daryna Dementieva
Edoardo Mosca
Georg Groh
author_facet Mohamed Hesham Ibrahim Abdalla
Simon Malberg
Daryna Dementieva
Edoardo Mosca
Georg Groh
author_sort Mohamed Hesham Ibrahim Abdalla
collection DOAJ
description As generative NLP can now produce content nearly indistinguishable from human writing, it is becoming difficult to identify genuine research contributions in academic writing and scientific publications. Moreover, information in machine-generated text can be factually wrong or even entirely fabricated. In this work, we introduce a novel benchmark dataset containing human-written and machine-generated scientific papers from SCIgen, GPT-2, GPT-3, ChatGPT, and Galactica, as well as papers co-created by humans and ChatGPT. We also experiment with several types of classifiers—linguistic-based and transformer-based—for detecting the authorship of scientific text. A strong focus is put on generalization capabilities and explainability to highlight the strengths and weaknesses of these detectors. Our work makes an important step towards creating more robust methods for distinguishing between human-written and machine-generated scientific papers, ultimately ensuring the integrity of scientific literature.
first_indexed 2024-03-10T21:11:22Z
format Article
id doaj.art-fba2512068e24c54bb744acf10fd288b
institution Directory Open Access Journal
issn 2078-2489
language English
last_indexed 2024-03-10T21:11:22Z
publishDate 2023-09-01
publisher MDPI AG
record_format Article
series Information
spelling doaj.art-fba2512068e24c54bb744acf10fd288b2023-11-19T16:47:40ZengMDPI AGInformation2078-24892023-09-01141052210.3390/info14100522A Benchmark Dataset to Distinguish Human-Written and Machine-Generated Scientific PapersMohamed Hesham Ibrahim Abdalla0Simon Malberg1Daryna Dementieva2Edoardo Mosca3Georg Groh4School of Computation, Information and Technology, Technical University of Munich, 80333 Munich, GermanySchool of Computation, Information and Technology, Technical University of Munich, 80333 Munich, GermanySchool of Computation, Information and Technology, Technical University of Munich, 80333 Munich, GermanySchool of Computation, Information and Technology, Technical University of Munich, 80333 Munich, GermanySchool of Computation, Information and Technology, Technical University of Munich, 80333 Munich, GermanyAs generative NLP can now produce content nearly indistinguishable from human writing, it is becoming difficult to identify genuine research contributions in academic writing and scientific publications. Moreover, information in machine-generated text can be factually wrong or even entirely fabricated. In this work, we introduce a novel benchmark dataset containing human-written and machine-generated scientific papers from SCIgen, GPT-2, GPT-3, ChatGPT, and Galactica, as well as papers co-created by humans and ChatGPT. We also experiment with several types of classifiers—linguistic-based and transformer-based—for detecting the authorship of scientific text. A strong focus is put on generalization capabilities and explainability to highlight the strengths and weaknesses of these detectors. Our work makes an important step towards creating more robust methods for distinguishing between human-written and machine-generated scientific papers, ultimately ensuring the integrity of scientific literature.https://www.mdpi.com/2078-2489/14/10/522text generationlarge language modelsmachine-generated text detection
spellingShingle Mohamed Hesham Ibrahim Abdalla
Simon Malberg
Daryna Dementieva
Edoardo Mosca
Georg Groh
A Benchmark Dataset to Distinguish Human-Written and Machine-Generated Scientific Papers
Information
text generation
large language models
machine-generated text detection
title A Benchmark Dataset to Distinguish Human-Written and Machine-Generated Scientific Papers
title_full A Benchmark Dataset to Distinguish Human-Written and Machine-Generated Scientific Papers
title_fullStr A Benchmark Dataset to Distinguish Human-Written and Machine-Generated Scientific Papers
title_full_unstemmed A Benchmark Dataset to Distinguish Human-Written and Machine-Generated Scientific Papers
title_short A Benchmark Dataset to Distinguish Human-Written and Machine-Generated Scientific Papers
title_sort benchmark dataset to distinguish human written and machine generated scientific papers
topic text generation
large language models
machine-generated text detection
url https://www.mdpi.com/2078-2489/14/10/522
work_keys_str_mv AT mohamedheshamibrahimabdalla abenchmarkdatasettodistinguishhumanwrittenandmachinegeneratedscientificpapers
AT simonmalberg abenchmarkdatasettodistinguishhumanwrittenandmachinegeneratedscientificpapers
AT darynadementieva abenchmarkdatasettodistinguishhumanwrittenandmachinegeneratedscientificpapers
AT edoardomosca abenchmarkdatasettodistinguishhumanwrittenandmachinegeneratedscientificpapers
AT georggroh abenchmarkdatasettodistinguishhumanwrittenandmachinegeneratedscientificpapers
AT mohamedheshamibrahimabdalla benchmarkdatasettodistinguishhumanwrittenandmachinegeneratedscientificpapers
AT simonmalberg benchmarkdatasettodistinguishhumanwrittenandmachinegeneratedscientificpapers
AT darynadementieva benchmarkdatasettodistinguishhumanwrittenandmachinegeneratedscientificpapers
AT edoardomosca benchmarkdatasettodistinguishhumanwrittenandmachinegeneratedscientificpapers
AT georggroh benchmarkdatasettodistinguishhumanwrittenandmachinegeneratedscientificpapers