LEIA: Linguistic Embeddings for the Identification of Affect

Abstract The wealth of text data generated by social media has enabled new kinds of analysis of emotions with language models. These models are often trained on small and costly datasets of text annotations produced by readers who guess the emotions expressed by others in social media posts. This af...

Full description

Bibliographic Details
Main Authors: Segun Taofeek Aroyehun, Lukas Malik, Hannah Metzler, Nikolas Haimerl, Anna Di Natale, David Garcia
Format: Article
Language:English
Published: SpringerOpen 2023-11-01
Series:EPJ Data Science
Subjects:
Online Access:https://doi.org/10.1140/epjds/s13688-023-00427-0
_version_ 1797577427156729856
author Segun Taofeek Aroyehun
Lukas Malik
Hannah Metzler
Nikolas Haimerl
Anna Di Natale
David Garcia
author_facet Segun Taofeek Aroyehun
Lukas Malik
Hannah Metzler
Nikolas Haimerl
Anna Di Natale
David Garcia
author_sort Segun Taofeek Aroyehun
collection DOAJ
description Abstract The wealth of text data generated by social media has enabled new kinds of analysis of emotions with language models. These models are often trained on small and costly datasets of text annotations produced by readers who guess the emotions expressed by others in social media posts. This affects the quality of emotion identification methods due to training data size limitations and noise in the production of labels used in model development. We present LEIA, a model for emotion identification in text that has been trained on a dataset of more than 6 million posts with self-annotated emotion labels for happiness, affection, sadness, anger, and fear. LEIA is based on a word masking method that enhances the learning of emotion words during model pre-training. LEIA achieves macro-F1 values of approximately 73 on three in-domain test datasets, outperforming other supervised and unsupervised methods in a strong benchmark that shows that LEIA generalizes across posts, users, and time periods. We further perform an out-of-domain evaluation on five different datasets of social media and other sources, showing LEIA’s robust performance across media, data collection methods, and annotation schemes. Our results show that LEIA generalizes its classification of anger, happiness, and sadness beyond the domain it was trained on. LEIA can be applied in future research to provide better identification of emotions in text from the perspective of the writer.
first_indexed 2024-03-10T22:09:08Z
format Article
id doaj.art-2c0c9754e01746b0a6b10b3afc491faf
institution Directory Open Access Journal
issn 2193-1127
language English
last_indexed 2024-03-10T22:09:08Z
publishDate 2023-11-01
publisher SpringerOpen
record_format Article
series EPJ Data Science
spelling doaj.art-2c0c9754e01746b0a6b10b3afc491faf2023-11-19T12:41:00ZengSpringerOpenEPJ Data Science2193-11272023-11-0112112110.1140/epjds/s13688-023-00427-0LEIA: Linguistic Embeddings for the Identification of AffectSegun Taofeek Aroyehun0Lukas Malik1Hannah Metzler2Nikolas Haimerl3Anna Di Natale4David Garcia5Department of Politics and Public Administration, University of KonstanzComplexity Science HubMedical University of ViennaVienna University of TechnologyMedical University of ViennaDepartment of Politics and Public Administration, University of KonstanzAbstract The wealth of text data generated by social media has enabled new kinds of analysis of emotions with language models. These models are often trained on small and costly datasets of text annotations produced by readers who guess the emotions expressed by others in social media posts. This affects the quality of emotion identification methods due to training data size limitations and noise in the production of labels used in model development. We present LEIA, a model for emotion identification in text that has been trained on a dataset of more than 6 million posts with self-annotated emotion labels for happiness, affection, sadness, anger, and fear. LEIA is based on a word masking method that enhances the learning of emotion words during model pre-training. LEIA achieves macro-F1 values of approximately 73 on three in-domain test datasets, outperforming other supervised and unsupervised methods in a strong benchmark that shows that LEIA generalizes across posts, users, and time periods. We further perform an out-of-domain evaluation on five different datasets of social media and other sources, showing LEIA’s robust performance across media, data collection methods, and annotation schemes. Our results show that LEIA generalizes its classification of anger, happiness, and sadness beyond the domain it was trained on. LEIA can be applied in future research to provide better identification of emotions in text from the perspective of the writer.https://doi.org/10.1140/epjds/s13688-023-00427-0Emotion detectionNatural language processingSocial mediaTransfer learning
spellingShingle Segun Taofeek Aroyehun
Lukas Malik
Hannah Metzler
Nikolas Haimerl
Anna Di Natale
David Garcia
LEIA: Linguistic Embeddings for the Identification of Affect
EPJ Data Science
Emotion detection
Natural language processing
Social media
Transfer learning
title LEIA: Linguistic Embeddings for the Identification of Affect
title_full LEIA: Linguistic Embeddings for the Identification of Affect
title_fullStr LEIA: Linguistic Embeddings for the Identification of Affect
title_full_unstemmed LEIA: Linguistic Embeddings for the Identification of Affect
title_short LEIA: Linguistic Embeddings for the Identification of Affect
title_sort leia linguistic embeddings for the identification of affect
topic Emotion detection
Natural language processing
Social media
Transfer learning
url https://doi.org/10.1140/epjds/s13688-023-00427-0
work_keys_str_mv AT seguntaofeekaroyehun leialinguisticembeddingsfortheidentificationofaffect
AT lukasmalik leialinguisticembeddingsfortheidentificationofaffect
AT hannahmetzler leialinguisticembeddingsfortheidentificationofaffect
AT nikolashaimerl leialinguisticembeddingsfortheidentificationofaffect
AT annadinatale leialinguisticembeddingsfortheidentificationofaffect
AT davidgarcia leialinguisticembeddingsfortheidentificationofaffect