Analysis of Deep Learning Model Combinations and Tokenization Approaches in Sentiment Classification

Sentiment classification is a natural language processing task to identify opinions expressed in texts such as product or service reviews. In this work, we analyze the effects of different deep-learning model combinations, embedding methods, and tokenization approaches in sentiment classification. W...

Full description

Bibliographic Details
Main Authors:	Ali Erkan, Tunga Gungor
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Machine learning deep neural networks natural language processing sentiment classification word embedding tokenization
Online Access:	https://ieeexplore.ieee.org/document/10332170/

_version_	1797376339841384448
author	Ali Erkan Tunga Gungor
author_facet	Ali Erkan Tunga Gungor
author_sort	Ali Erkan
collection	DOAJ
description	Sentiment classification is a natural language processing task to identify opinions expressed in texts such as product or service reviews. In this work, we analyze the effects of different deep-learning model combinations, embedding methods, and tokenization approaches in sentiment classification. We feed non-contextualized (Word2Vec and GloVe) and contextualized (BERT and RoBERTa/XLM-RoBERTa) embeddings and also the output of the pretrained BERT and RoBERTa/XLM-RoBERTa models as input to neural models. We make a comprehensive analysis of eleven different tokenization approaches, including the commonly used subword methods and morphologically motivated segmentations. The experiments are conducted on three English and two Turkish datasets from different domains. The results show that BERT- and RoBERTa-/XLM-RoBERTa-based and contextualized embeddings outperform other neural models. We also observe that using words in raw or preprocessed form, stemming the words, and applying WordPiece tokenizations give the most promising results in the sentiment analysis task. We ensemble the models to find out which tokenization approaches produce better results together.
first_indexed	2024-03-08T19:37:07Z
format	Article
id	doaj.art-3319d906644d4c0b90dd52623fca0003
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-08T19:37:07Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-3319d906644d4c0b90dd52623fca00032023-12-26T00:06:30ZengIEEEIEEE Access2169-35362023-01-011113495113496810.1109/ACCESS.2023.333735410332170Analysis of Deep Learning Model Combinations and Tokenization Approaches in Sentiment ClassificationAli Erkan0https://orcid.org/0000-0003-0125-8110Tunga Gungor1https://orcid.org/0000-0001-9448-9422Department of Computer Engineering, Boğaziçi University, Istanbul, TurkeyDepartment of Computer Engineering, Boğaziçi University, Istanbul, TurkeySentiment classification is a natural language processing task to identify opinions expressed in texts such as product or service reviews. In this work, we analyze the effects of different deep-learning model combinations, embedding methods, and tokenization approaches in sentiment classification. We feed non-contextualized (Word2Vec and GloVe) and contextualized (BERT and RoBERTa/XLM-RoBERTa) embeddings and also the output of the pretrained BERT and RoBERTa/XLM-RoBERTa models as input to neural models. We make a comprehensive analysis of eleven different tokenization approaches, including the commonly used subword methods and morphologically motivated segmentations. The experiments are conducted on three English and two Turkish datasets from different domains. The results show that BERT- and RoBERTa-/XLM-RoBERTa-based and contextualized embeddings outperform other neural models. We also observe that using words in raw or preprocessed form, stemming the words, and applying WordPiece tokenizations give the most promising results in the sentiment analysis task. We ensemble the models to find out which tokenization approaches produce better results together.https://ieeexplore.ieee.org/document/10332170/Machine learningdeep neural networksnatural language processingsentiment classificationword embeddingtokenization
spellingShingle	Ali Erkan Tunga Gungor Analysis of Deep Learning Model Combinations and Tokenization Approaches in Sentiment Classification IEEE Access Machine learning deep neural networks natural language processing sentiment classification word embedding tokenization
title	Analysis of Deep Learning Model Combinations and Tokenization Approaches in Sentiment Classification
title_full	Analysis of Deep Learning Model Combinations and Tokenization Approaches in Sentiment Classification
title_fullStr	Analysis of Deep Learning Model Combinations and Tokenization Approaches in Sentiment Classification
title_full_unstemmed	Analysis of Deep Learning Model Combinations and Tokenization Approaches in Sentiment Classification
title_short	Analysis of Deep Learning Model Combinations and Tokenization Approaches in Sentiment Classification
title_sort	analysis of deep learning model combinations and tokenization approaches in sentiment classification
topic	Machine learning deep neural networks natural language processing sentiment classification word embedding tokenization
url	https://ieeexplore.ieee.org/document/10332170/
work_keys_str_mv	AT alierkan analysisofdeeplearningmodelcombinationsandtokenizationapproachesinsentimentclassification AT tungagungor analysisofdeeplearningmodelcombinationsandtokenizationapproachesinsentimentclassification

Analysis of Deep Learning Model Combinations and Tokenization Approaches in Sentiment Classification

Similar Items