Analysis of Deep Learning Model Combinations and Tokenization Approaches in Sentiment Classification

Sentiment classification is a natural language processing task to identify opinions expressed in texts such as product or service reviews. In this work, we analyze the effects of different deep-learning model combinations, embedding methods, and tokenization approaches in sentiment classification. W...

Full description

Bibliographic Details
Main Authors:	Ali Erkan, Tunga Gungor
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Machine learning deep neural networks natural language processing sentiment classification word embedding tokenization
Online Access:	https://ieeexplore.ieee.org/document/10332170/

Description
Summary:	Sentiment classification is a natural language processing task to identify opinions expressed in texts such as product or service reviews. In this work, we analyze the effects of different deep-learning model combinations, embedding methods, and tokenization approaches in sentiment classification. We feed non-contextualized (Word2Vec and GloVe) and contextualized (BERT and RoBERTa/XLM-RoBERTa) embeddings and also the output of the pretrained BERT and RoBERTa/XLM-RoBERTa models as input to neural models. We make a comprehensive analysis of eleven different tokenization approaches, including the commonly used subword methods and morphologically motivated segmentations. The experiments are conducted on three English and two Turkish datasets from different domains. The results show that BERT- and RoBERTa-/XLM-RoBERTa-based and contextualized embeddings outperform other neural models. We also observe that using words in raw or preprocessed form, stemming the words, and applying WordPiece tokenizations give the most promising results in the sentiment analysis task. We ensemble the models to find out which tokenization approaches produce better results together.
ISSN:	2169-3536

Analysis of Deep Learning Model Combinations and Tokenization Approaches in Sentiment Classification

Similar Items