The Limitations of Stylometry for Detecting Machine-Generated Fake News
© 2020 Association for Computational Linguistics. Recent developments in neural language models (LMs) have raised concerns about their potential misuse for automatically spreading misinformation. In light of these concerns, several studies have proposed to detect machine-generated fake news by captu...
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
MIT Press - Journals
2021
|
Online Access: | https://hdl.handle.net/1721.1/135419 |
_version_ | 1826200812612747264 |
---|---|
author | Schuster, Tal Schuster, Roei Shah, Darsh J Barzilay, Regina |
author2 | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory |
author_facet | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Schuster, Tal Schuster, Roei Shah, Darsh J Barzilay, Regina |
author_sort | Schuster, Tal |
collection | MIT |
description | © 2020 Association for Computational Linguistics. Recent developments in neural language models (LMs) have raised concerns about their potential misuse for automatically spreading misinformation. In light of these concerns, several studies have proposed to detect machine-generated fake news by capturing their stylistic differences from human-written text. These approaches, broadly termed stylometry, have found success in source attribution and misinformation detection in human-written texts. However, in this work, we show that stylometry is limited against machine-generated misinformation. Whereas humans speak differently when trying to deceive, LMs generate stylistically consistent text, regardless of underlying motive. Thus, though stylometry can successfully prevent impersonation by identifying text provenance, it fails to distinguish legitimate LM applications from those that introduce false information. We create two benchmarks demonstrating the stylistic similarity between malicious and legitimate uses of LMs, utilized in auto-completion and editing-assistance settings.1 Our findings highlight the need for non-stylometry approaches in detecting machine-generated misinformation, and open up the discussion on the desired evaluation benchmarks. |
first_indexed | 2024-09-23T11:42:11Z |
format | Article |
id | mit-1721.1/135419 |
institution | Massachusetts Institute of Technology |
language | English |
last_indexed | 2024-09-23T11:42:11Z |
publishDate | 2021 |
publisher | MIT Press - Journals |
record_format | dspace |
spelling | mit-1721.1/1354192023-02-24T18:43:25Z The Limitations of Stylometry for Detecting Machine-Generated Fake News Schuster, Tal Schuster, Roei Shah, Darsh J Barzilay, Regina Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory © 2020 Association for Computational Linguistics. Recent developments in neural language models (LMs) have raised concerns about their potential misuse for automatically spreading misinformation. In light of these concerns, several studies have proposed to detect machine-generated fake news by capturing their stylistic differences from human-written text. These approaches, broadly termed stylometry, have found success in source attribution and misinformation detection in human-written texts. However, in this work, we show that stylometry is limited against machine-generated misinformation. Whereas humans speak differently when trying to deceive, LMs generate stylistically consistent text, regardless of underlying motive. Thus, though stylometry can successfully prevent impersonation by identifying text provenance, it fails to distinguish legitimate LM applications from those that introduce false information. We create two benchmarks demonstrating the stylistic similarity between malicious and legitimate uses of LMs, utilized in auto-completion and editing-assistance settings.1 Our findings highlight the need for non-stylometry approaches in detecting machine-generated misinformation, and open up the discussion on the desired evaluation benchmarks. 2021-10-27T20:23:24Z 2021-10-27T20:23:24Z 2020 2020-12-01T17:58:51Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/135419 en 10.1162/COLI_A_00380 Computational Linguistics Creative Commons Attribution-NonCommercial-NoDerivs License http://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf MIT Press - Journals MIT Press |
spellingShingle | Schuster, Tal Schuster, Roei Shah, Darsh J Barzilay, Regina The Limitations of Stylometry for Detecting Machine-Generated Fake News |
title | The Limitations of Stylometry for Detecting Machine-Generated Fake News |
title_full | The Limitations of Stylometry for Detecting Machine-Generated Fake News |
title_fullStr | The Limitations of Stylometry for Detecting Machine-Generated Fake News |
title_full_unstemmed | The Limitations of Stylometry for Detecting Machine-Generated Fake News |
title_short | The Limitations of Stylometry for Detecting Machine-Generated Fake News |
title_sort | limitations of stylometry for detecting machine generated fake news |
url | https://hdl.handle.net/1721.1/135419 |
work_keys_str_mv | AT schustertal thelimitationsofstylometryfordetectingmachinegeneratedfakenews AT schusterroei thelimitationsofstylometryfordetectingmachinegeneratedfakenews AT shahdarshj thelimitationsofstylometryfordetectingmachinegeneratedfakenews AT barzilayregina thelimitationsofstylometryfordetectingmachinegeneratedfakenews AT schustertal limitationsofstylometryfordetectingmachinegeneratedfakenews AT schusterroei limitationsofstylometryfordetectingmachinegeneratedfakenews AT shahdarshj limitationsofstylometryfordetectingmachinegeneratedfakenews AT barzilayregina limitationsofstylometryfordetectingmachinegeneratedfakenews |