The Limitations of Stylometry for Detecting Machine-Generated Fake News

© 2020 Association for Computational Linguistics. Recent developments in neural language models (LMs) have raised concerns about their potential misuse for automatically spreading misinformation. In light of these concerns, several studies have proposed to detect machine-generated fake news by captu...

Full description

Bibliographic Details
Main Authors: Schuster, Tal, Schuster, Roei, Shah, Darsh J, Barzilay, Regina
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:English
Published: MIT Press - Journals 2021
Online Access:https://hdl.handle.net/1721.1/135419
_version_ 1826200812612747264
author Schuster, Tal
Schuster, Roei
Shah, Darsh J
Barzilay, Regina
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Schuster, Tal
Schuster, Roei
Shah, Darsh J
Barzilay, Regina
author_sort Schuster, Tal
collection MIT
description © 2020 Association for Computational Linguistics. Recent developments in neural language models (LMs) have raised concerns about their potential misuse for automatically spreading misinformation. In light of these concerns, several studies have proposed to detect machine-generated fake news by capturing their stylistic differences from human-written text. These approaches, broadly termed stylometry, have found success in source attribution and misinformation detection in human-written texts. However, in this work, we show that stylometry is limited against machine-generated misinformation. Whereas humans speak differently when trying to deceive, LMs generate stylistically consistent text, regardless of underlying motive. Thus, though stylometry can successfully prevent impersonation by identifying text provenance, it fails to distinguish legitimate LM applications from those that introduce false information. We create two benchmarks demonstrating the stylistic similarity between malicious and legitimate uses of LMs, utilized in auto-completion and editing-assistance settings.1 Our findings highlight the need for non-stylometry approaches in detecting machine-generated misinformation, and open up the discussion on the desired evaluation benchmarks.
first_indexed 2024-09-23T11:42:11Z
format Article
id mit-1721.1/135419
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T11:42:11Z
publishDate 2021
publisher MIT Press - Journals
record_format dspace
spelling mit-1721.1/1354192023-02-24T18:43:25Z The Limitations of Stylometry for Detecting Machine-Generated Fake News Schuster, Tal Schuster, Roei Shah, Darsh J Barzilay, Regina Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory © 2020 Association for Computational Linguistics. Recent developments in neural language models (LMs) have raised concerns about their potential misuse for automatically spreading misinformation. In light of these concerns, several studies have proposed to detect machine-generated fake news by capturing their stylistic differences from human-written text. These approaches, broadly termed stylometry, have found success in source attribution and misinformation detection in human-written texts. However, in this work, we show that stylometry is limited against machine-generated misinformation. Whereas humans speak differently when trying to deceive, LMs generate stylistically consistent text, regardless of underlying motive. Thus, though stylometry can successfully prevent impersonation by identifying text provenance, it fails to distinguish legitimate LM applications from those that introduce false information. We create two benchmarks demonstrating the stylistic similarity between malicious and legitimate uses of LMs, utilized in auto-completion and editing-assistance settings.1 Our findings highlight the need for non-stylometry approaches in detecting machine-generated misinformation, and open up the discussion on the desired evaluation benchmarks. 2021-10-27T20:23:24Z 2021-10-27T20:23:24Z 2020 2020-12-01T17:58:51Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/135419 en 10.1162/COLI_A_00380 Computational Linguistics Creative Commons Attribution-NonCommercial-NoDerivs License http://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf MIT Press - Journals MIT Press
spellingShingle Schuster, Tal
Schuster, Roei
Shah, Darsh J
Barzilay, Regina
The Limitations of Stylometry for Detecting Machine-Generated Fake News
title The Limitations of Stylometry for Detecting Machine-Generated Fake News
title_full The Limitations of Stylometry for Detecting Machine-Generated Fake News
title_fullStr The Limitations of Stylometry for Detecting Machine-Generated Fake News
title_full_unstemmed The Limitations of Stylometry for Detecting Machine-Generated Fake News
title_short The Limitations of Stylometry for Detecting Machine-Generated Fake News
title_sort limitations of stylometry for detecting machine generated fake news
url https://hdl.handle.net/1721.1/135419
work_keys_str_mv AT schustertal thelimitationsofstylometryfordetectingmachinegeneratedfakenews
AT schusterroei thelimitationsofstylometryfordetectingmachinegeneratedfakenews
AT shahdarshj thelimitationsofstylometryfordetectingmachinegeneratedfakenews
AT barzilayregina thelimitationsofstylometryfordetectingmachinegeneratedfakenews
AT schustertal limitationsofstylometryfordetectingmachinegeneratedfakenews
AT schusterroei limitationsofstylometryfordetectingmachinegeneratedfakenews
AT shahdarshj limitationsofstylometryfordetectingmachinegeneratedfakenews
AT barzilayregina limitationsofstylometryfordetectingmachinegeneratedfakenews