Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus

Sentiment polarity classification in social media is a very important task, as it enables gathering trends on particular subjects given a set of opinions. Currently, a great advance has been made by using deep learning techniques, such as word embeddings, recurrent neural networks, and encoders, suc...

Full description

Bibliographic Details
Main Authors: Consuelo V. García-Mendoza, Omar J. Gambino, Miguel G. Villarreal-Cervantes, Hiram Calvo
Format: Article
Language:English
Published: MDPI AG 2020-09-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/22/9/1020
_version_ 1797553838947827712
author Consuelo V. García-Mendoza
Omar J. Gambino
Miguel G. Villarreal-Cervantes
Hiram Calvo
author_facet Consuelo V. García-Mendoza
Omar J. Gambino
Miguel G. Villarreal-Cervantes
Hiram Calvo
author_sort Consuelo V. García-Mendoza
collection DOAJ
description Sentiment polarity classification in social media is a very important task, as it enables gathering trends on particular subjects given a set of opinions. Currently, a great advance has been made by using deep learning techniques, such as word embeddings, recurrent neural networks, and encoders, such as BERT. Unfortunately, these techniques require large amounts of data, which, in some cases, is not available. In order to model this situation, challenges, such as the Spanish TASS organized by the Spanish Society for Natural Language Processing (SEPLN), have been proposed, which pose particular difficulties: First, an unwieldy balance in the training and the test set, being this latter more than eight times the size of the training set. Another difficulty is the marked unbalance in the distribution of classes, which is also different between both sets. Finally, there are four different labels, which create the need to adapt current classifications methods for multiclass handling. Traditional machine learning methods, such as Naïve Bayes, Logistic Regression, and Support Vector Machines, achieve modest performance in these conditions, but used as an ensemble it is possible to attain competitive execution. Several strategies to build classifier ensembles have been proposed; this paper proposes estimating an optimal weighting scheme using a Differential Evolution algorithm focused on dealing with particular issues that multiclass classification and unbalanced corpora pose. The ensemble with the proposed optimized weighting scheme is able to improve the classification results on the full test set of the TASS challenge (General corpus), achieving state of the art performance when compared with other works on this task, which make no use of NLP techniques.
first_indexed 2024-03-10T16:23:21Z
format Article
id doaj.art-36c14557fc1342958bd327dcbde9b2d8
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-03-10T16:23:21Z
publishDate 2020-09-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-36c14557fc1342958bd327dcbde9b2d82023-11-20T13:29:28ZengMDPI AGEntropy1099-43002020-09-01229102010.3390/e22091020Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass CorpusConsuelo V. García-Mendoza0Omar J. Gambino1Miguel G. Villarreal-Cervantes2Hiram Calvo3Escuela Superior de Cómputo, Instituto Politécnico Nacional, Mexico City 07738, MexicoEscuela Superior de Cómputo, Instituto Politécnico Nacional, Mexico City 07738, MexicoCentro de Innovación y Desarrollo Tecnológico en Cómputo, Instituto Politécnico Nacional, Mexico City 07700, MexicoCentro de Investigación en Computación, Instituto Politécnico Nacional, Mexico City 07738, MexicoSentiment polarity classification in social media is a very important task, as it enables gathering trends on particular subjects given a set of opinions. Currently, a great advance has been made by using deep learning techniques, such as word embeddings, recurrent neural networks, and encoders, such as BERT. Unfortunately, these techniques require large amounts of data, which, in some cases, is not available. In order to model this situation, challenges, such as the Spanish TASS organized by the Spanish Society for Natural Language Processing (SEPLN), have been proposed, which pose particular difficulties: First, an unwieldy balance in the training and the test set, being this latter more than eight times the size of the training set. Another difficulty is the marked unbalance in the distribution of classes, which is also different between both sets. Finally, there are four different labels, which create the need to adapt current classifications methods for multiclass handling. Traditional machine learning methods, such as Naïve Bayes, Logistic Regression, and Support Vector Machines, achieve modest performance in these conditions, but used as an ensemble it is possible to attain competitive execution. Several strategies to build classifier ensembles have been proposed; this paper proposes estimating an optimal weighting scheme using a Differential Evolution algorithm focused on dealing with particular issues that multiclass classification and unbalanced corpora pose. The ensemble with the proposed optimized weighting scheme is able to improve the classification results on the full test set of the TASS challenge (General corpus), achieving state of the art performance when compared with other works on this task, which make no use of NLP techniques.https://www.mdpi.com/1099-4300/22/9/1020sentiment polarityensemble learningunbalanced classesevolutionary optimizationTwitter sentiment analysis
spellingShingle Consuelo V. García-Mendoza
Omar J. Gambino
Miguel G. Villarreal-Cervantes
Hiram Calvo
Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus
Entropy
sentiment polarity
ensemble learning
unbalanced classes
evolutionary optimization
Twitter sentiment analysis
title Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus
title_full Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus
title_fullStr Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus
title_full_unstemmed Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus
title_short Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus
title_sort evolutionary optimization of ensemble learning to determine sentiment polarity in an unbalanced multiclass corpus
topic sentiment polarity
ensemble learning
unbalanced classes
evolutionary optimization
Twitter sentiment analysis
url https://www.mdpi.com/1099-4300/22/9/1020
work_keys_str_mv AT consuelovgarciamendoza evolutionaryoptimizationofensemblelearningtodeterminesentimentpolarityinanunbalancedmulticlasscorpus
AT omarjgambino evolutionaryoptimizationofensemblelearningtodeterminesentimentpolarityinanunbalancedmulticlasscorpus
AT miguelgvillarrealcervantes evolutionaryoptimizationofensemblelearningtodeterminesentimentpolarityinanunbalancedmulticlasscorpus
AT hiramcalvo evolutionaryoptimizationofensemblelearningtodeterminesentimentpolarityinanunbalancedmulticlasscorpus