A qualitative analysis of sarcasm, irony and related #hashtags on Twitter

As the use of automated social media analysis tools surges, concerns over accuracy of analytics have increased. Some tentative evidence suggests that sarcasm alone could account for as much as a 50% drop in accuracy when automatically detecting sentiment. This paper assesses and outlines the prevale...

Full description

Bibliographic Details
Main Authors: Martin Sykora, Suzanne Elayan, Thomas W Jackson
Format: Article
Language:English
Published: SAGE Publishing 2020-11-01
Series:Big Data & Society
Online Access:https://doi.org/10.1177/2053951720972735
_version_ 1819076537968754688
author Martin Sykora
Suzanne Elayan
Thomas W Jackson
author_facet Martin Sykora
Suzanne Elayan
Thomas W Jackson
author_sort Martin Sykora
collection DOAJ
description As the use of automated social media analysis tools surges, concerns over accuracy of analytics have increased. Some tentative evidence suggests that sarcasm alone could account for as much as a 50% drop in accuracy when automatically detecting sentiment. This paper assesses and outlines the prevalence of sarcastic and ironic language within social media posts. Several past studies proposed models for automatic sarcasm and irony detection for sentiment analysis; however, these approaches result in models trained on training data of highly questionable quality, with little qualitative appreciation of the underlying data. To understand the issues and scale of the problem, we are the first to conduct and present results of a focused manual semantic annotation analysis of two datasets of Twitter messages (in total 4334 tweets), associated with; (i) hashtags commonly employed in automated sarcasm and irony detection approaches, and (ii) tweets relating to 25 distinct events, including, scandals, product releases, cultural events, accidents, terror incidents, etc. We also highlight the contextualised use of multi-word hashtags in the communication of humour, sarcasm and irony, pointing out that many sentiment analysis tools simply fail to recognise such hashtag-based expressions. Our findings also offer indicative evidence regarding the quality of training data used for automated machine learning models in sarcasm, irony and sentiment detection. Worryingly only 15% of tweets labelled as sarcastic were truly sarcastic. We highlight the need for future research studies to rethink their approach to data preparation and a more careful interpretation of sentiment analysis.
first_indexed 2024-12-21T18:42:53Z
format Article
id doaj.art-2eb36cf2316146d9a633314ccca3fa12
institution Directory Open Access Journal
issn 2053-9517
language English
last_indexed 2024-12-21T18:42:53Z
publishDate 2020-11-01
publisher SAGE Publishing
record_format Article
series Big Data & Society
spelling doaj.art-2eb36cf2316146d9a633314ccca3fa122022-12-21T18:53:58ZengSAGE PublishingBig Data & Society2053-95172020-11-01710.1177/2053951720972735A qualitative analysis of sarcasm, irony and related #hashtags on TwitterMartin SykoraSuzanne ElayanThomas W JacksonAs the use of automated social media analysis tools surges, concerns over accuracy of analytics have increased. Some tentative evidence suggests that sarcasm alone could account for as much as a 50% drop in accuracy when automatically detecting sentiment. This paper assesses and outlines the prevalence of sarcastic and ironic language within social media posts. Several past studies proposed models for automatic sarcasm and irony detection for sentiment analysis; however, these approaches result in models trained on training data of highly questionable quality, with little qualitative appreciation of the underlying data. To understand the issues and scale of the problem, we are the first to conduct and present results of a focused manual semantic annotation analysis of two datasets of Twitter messages (in total 4334 tweets), associated with; (i) hashtags commonly employed in automated sarcasm and irony detection approaches, and (ii) tweets relating to 25 distinct events, including, scandals, product releases, cultural events, accidents, terror incidents, etc. We also highlight the contextualised use of multi-word hashtags in the communication of humour, sarcasm and irony, pointing out that many sentiment analysis tools simply fail to recognise such hashtag-based expressions. Our findings also offer indicative evidence regarding the quality of training data used for automated machine learning models in sarcasm, irony and sentiment detection. Worryingly only 15% of tweets labelled as sarcastic were truly sarcastic. We highlight the need for future research studies to rethink their approach to data preparation and a more careful interpretation of sentiment analysis.https://doi.org/10.1177/2053951720972735
spellingShingle Martin Sykora
Suzanne Elayan
Thomas W Jackson
A qualitative analysis of sarcasm, irony and related #hashtags on Twitter
Big Data & Society
title A qualitative analysis of sarcasm, irony and related #hashtags on Twitter
title_full A qualitative analysis of sarcasm, irony and related #hashtags on Twitter
title_fullStr A qualitative analysis of sarcasm, irony and related #hashtags on Twitter
title_full_unstemmed A qualitative analysis of sarcasm, irony and related #hashtags on Twitter
title_short A qualitative analysis of sarcasm, irony and related #hashtags on Twitter
title_sort qualitative analysis of sarcasm irony and related hashtags on twitter
url https://doi.org/10.1177/2053951720972735
work_keys_str_mv AT martinsykora aqualitativeanalysisofsarcasmironyandrelatedhashtagsontwitter
AT suzanneelayan aqualitativeanalysisofsarcasmironyandrelatedhashtagsontwitter
AT thomaswjackson aqualitativeanalysisofsarcasmironyandrelatedhashtagsontwitter
AT martinsykora qualitativeanalysisofsarcasmironyandrelatedhashtagsontwitter
AT suzanneelayan qualitativeanalysisofsarcasmironyandrelatedhashtagsontwitter
AT thomaswjackson qualitativeanalysisofsarcasmironyandrelatedhashtagsontwitter