Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets

Automatic satire identification can help to identify texts in which the intended meaning differs from the literal meaning, improving tasks such as sentiment analysis, fake news detection or natural-language user interfaces. Typically, satire identification is performed by training a supervised class...

Full description

Bibliographic Details
Main Authors:	Óscar Apolinario-Arzube, José Antonio García-Díaz, José Medina-Moreira, Harry Luna-Aveiga, Rafael Valencia-García
Format:	Article
Language:	English
Published:	MDPI AG 2020-11-01
Series:	Mathematics
Subjects:	automatic satire identification text classification natural language processing
Online Access:	https://www.mdpi.com/2227-7390/8/11/2075

_version_	1797547291057324032
author	Óscar Apolinario-Arzube José Antonio García-Díaz José Medina-Moreira Harry Luna-Aveiga Rafael Valencia-García
author_facet	Óscar Apolinario-Arzube José Antonio García-Díaz José Medina-Moreira Harry Luna-Aveiga Rafael Valencia-García
author_sort	Óscar Apolinario-Arzube
collection	DOAJ
description	Automatic satire identification can help to identify texts in which the intended meaning differs from the literal meaning, improving tasks such as sentiment analysis, fake news detection or natural-language user interfaces. Typically, satire identification is performed by training a supervised classifier for finding linguistic clues that can determine whether a text is satirical or not. For this, the state-of-the-art relies on neural networks fed with word embeddings that are capable of learning interesting characteristics regarding the way humans communicate. However, as far as our knowledge goes, there are no comprehensive studies that evaluate these techniques in Spanish in the satire identification domain. Consequently, in this work we evaluate several deep-learning architectures with Spanish pre-trained word-embeddings and compare the results with strong baselines based on term-counting features. This evaluation is performed with two datasets that contain satirical and non-satirical tweets written in two Spanish variants: European Spanish and Mexican Spanish. Our experimentation revealed that term-counting features achieved similar results to deep-learning approaches based on word-embeddings, both outperforming previous results based on linguistic features. Our results suggest that term-counting features and traditional machine learning models provide competitive results regarding automatic satire identification, slightly outperforming state-of-the-art models.
first_indexed	2024-03-10T14:42:15Z
format	Article
id	doaj.art-b51bf8fa62114428b4085d8d27ef4f3b
institution	Directory Open Access Journal
issn	2227-7390
language	English
last_indexed	2024-03-10T14:42:15Z
publishDate	2020-11-01
publisher	MDPI AG
record_format	Article
series	Mathematics
spelling	doaj.art-b51bf8fa62114428b4085d8d27ef4f3b2023-11-20T21:42:42ZengMDPI AGMathematics2227-73902020-11-01811207510.3390/math8112075Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish TweetsÓscar Apolinario-Arzube0José Antonio García-Díaz1José Medina-Moreira2Harry Luna-Aveiga3Rafael Valencia-García4Facultad de Ciencias Matemáticas y Físicas, Universidad de Guayaquil, Cdla, Universitaria Salvador Allende, Guayaquil 090514, EcuadorFacultad de Informática, Universidad de Murcia, Campus de Espinardo, 30100 Murcia, SpainFacultad de Ciencias Agrarias, Universidad Agraria del Ecuador, Av. 25 de Julio, Guayaquil 090114, EcuadorFacultad de Ciencias Matemáticas y Físicas, Universidad de Guayaquil, Cdla, Universitaria Salvador Allende, Guayaquil 090514, EcuadorFacultad de Informática, Universidad de Murcia, Campus de Espinardo, 30100 Murcia, SpainAutomatic satire identification can help to identify texts in which the intended meaning differs from the literal meaning, improving tasks such as sentiment analysis, fake news detection or natural-language user interfaces. Typically, satire identification is performed by training a supervised classifier for finding linguistic clues that can determine whether a text is satirical or not. For this, the state-of-the-art relies on neural networks fed with word embeddings that are capable of learning interesting characteristics regarding the way humans communicate. However, as far as our knowledge goes, there are no comprehensive studies that evaluate these techniques in Spanish in the satire identification domain. Consequently, in this work we evaluate several deep-learning architectures with Spanish pre-trained word-embeddings and compare the results with strong baselines based on term-counting features. This evaluation is performed with two datasets that contain satirical and non-satirical tweets written in two Spanish variants: European Spanish and Mexican Spanish. Our experimentation revealed that term-counting features achieved similar results to deep-learning approaches based on word-embeddings, both outperforming previous results based on linguistic features. Our results suggest that term-counting features and traditional machine learning models provide competitive results regarding automatic satire identification, slightly outperforming state-of-the-art models.https://www.mdpi.com/2227-7390/8/11/2075automatic satire identificationtext classificationnatural language processing
spellingShingle	Óscar Apolinario-Arzube José Antonio García-Díaz José Medina-Moreira Harry Luna-Aveiga Rafael Valencia-García Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets Mathematics automatic satire identification text classification natural language processing
title	Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets
title_full	Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets
title_fullStr	Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets
title_full_unstemmed	Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets
title_short	Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets
title_sort	comparing deep learning architectures and traditional machine learning approaches for satire identification in spanish tweets
topic	automatic satire identification text classification natural language processing
url	https://www.mdpi.com/2227-7390/8/11/2075
work_keys_str_mv	AT oscarapolinarioarzube comparingdeeplearningarchitecturesandtraditionalmachinelearningapproachesforsatireidentificationinspanishtweets AT joseantoniogarciadiaz comparingdeeplearningarchitecturesandtraditionalmachinelearningapproachesforsatireidentificationinspanishtweets AT josemedinamoreira comparingdeeplearningarchitecturesandtraditionalmachinelearningapproachesforsatireidentificationinspanishtweets AT harrylunaaveiga comparingdeeplearningarchitecturesandtraditionalmachinelearningapproachesforsatireidentificationinspanishtweets AT rafaelvalenciagarcia comparingdeeplearningarchitecturesandtraditionalmachinelearningapproachesforsatireidentificationinspanishtweets

Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets

Similar Items