A study of the performance of embedding methods for Arabic short-text sentiment analysis using deep learning approaches

Sentiment analysis aims to classify a text according to sentimental polarities of people’s opinions, such as positive, negative, or neutral. While most of the studies focus on eliciting features from English text, the research on Arabic is limited due to the morphological and grammatical complexity...

Full description

Bibliographic Details
Main Authors: Ali Alwehaibi, Marwan Bikdash, Mohammad Albogmi, Kaushik Roy
Format: Article
Language:English
Published: Elsevier 2022-09-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1319157821001786
Description
Summary:Sentiment analysis aims to classify a text according to sentimental polarities of people’s opinions, such as positive, negative, or neutral. While most of the studies focus on eliciting features from English text, the research on Arabic is limited due to the morphological and grammatical complexity of Arabic language. In this paper, we proposed an optimized sentiment classification for dialectal Arabic short text at the document level using deep learning (DL). The contributions of this paper are in three areas. First, we extracted semantic features for Arabic short text at the word level and character level. Second, we used three DL topologies for classification models: a long short-term memory recurrent neural network (LSTM); a convolutional neural network (CNN); and an ensemble model combining both models’ advantages to improve the prediction performance. Third, we used a hyperparameter tuning estimation method to optimize the neural network performance. We trained and tested our proposed models on a dataset that consists of Modern Standard Arabic and dialectal Arabic corpus collected from Twitter. The results showed significant improvement in Arabic text classification in term of classification accuracy that ranges between 88% and 69.7%. The ensemble model scored the highest accuracy of 96.7% on the test set.
ISSN:1319-1578