Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning

Social media networks have grown exponentially over the last two decades, providing the opportunity for users of the internet to communicate and exchange ideas on a variety of topics. The outcome is that opinion mining plays a crucial role in analyzing user opinions and applying these to guide choic...

Full description

Bibliographic Details
Main Authors: Nasrin Elhassan, Giuseppe Varone, Rami Ahmed, Mandar Gogate, Kia Dashtipour, Hani Almoamari, Mohammed A. El-Affendi, Bassam Naji Al-Tamimi, Faisal Albalwy, Amir Hussain
Format: Article
Language:English
Published: MDPI AG 2023-06-01
Series:Computers
Subjects:
Online Access:https://www.mdpi.com/2073-431X/12/6/126
_version_ 1797595419101888512
author Nasrin Elhassan
Giuseppe Varone
Rami Ahmed
Mandar Gogate
Kia Dashtipour
Hani Almoamari
Mohammed A. El-Affendi
Bassam Naji Al-Tamimi
Faisal Albalwy
Amir Hussain
author_facet Nasrin Elhassan
Giuseppe Varone
Rami Ahmed
Mandar Gogate
Kia Dashtipour
Hani Almoamari
Mohammed A. El-Affendi
Bassam Naji Al-Tamimi
Faisal Albalwy
Amir Hussain
author_sort Nasrin Elhassan
collection DOAJ
description Social media networks have grown exponentially over the last two decades, providing the opportunity for users of the internet to communicate and exchange ideas on a variety of topics. The outcome is that opinion mining plays a crucial role in analyzing user opinions and applying these to guide choices, making it one of the most popular areas of research in the field of natural language processing. Despite the fact that several languages, including English, have been the subjects of several studies, not much has been conducted in the area of the Arabic language. The morphological complexities and various dialects of the language make semantic analysis particularly challenging. Moreover, the lack of accurate pre-processing tools and limited resources are constraining factors. This novel study was motivated by the accomplishments of deep learning algorithms and word embeddings in the field of English sentiment analysis. Extensive experiments were conducted based on supervised machine learning in which word embeddings were exploited to determine the sentiment of Arabic reviews. Three deep learning algorithms, convolutional neural networks (CNNs), long short-term memory (LSTM), and a hybrid CNN-LSTM, were introduced. The models used features learned by word embeddings such as Word2Vec and fastText rather than hand-crafted features. The models were tested using two benchmark Arabic datasets: Hotel Arabic Reviews Dataset (HARD) for hotel reviews and Large-Scale Arabic Book Reviews (LARB) for book reviews, with different setups. Comparative experiments utilized the three models with two-word embeddings and different setups of the datasets. The main novelty of this study is to explore the effectiveness of using various word embeddings and different setups of benchmark datasets relating to balance, imbalance, and binary and multi-classification aspects. Findings showed that the best results were obtained in most cases when applying the fastText word embedding using the HARD 2-imbalance dataset for all three proposed models: CNN, LSTM, and CNN-LSTM. Further, the proposed CNN model outperformed the LSTM and CNN-LSTM models for the benchmark HARD dataset by achieving 94.69%, 94.63%, and 94.54% accuracy with fastText, respectively. Although the worst results were obtained for the LABR 3-imbalance dataset using both Word2Vec and FastText, they still outperformed other researchers’ state-of-the-art outcomes applying the same dataset.
first_indexed 2024-03-11T02:37:16Z
format Article
id doaj.art-cc0d0d46e8674dcbb0fee7443f2a7ec3
institution Directory Open Access Journal
issn 2073-431X
language English
last_indexed 2024-03-11T02:37:16Z
publishDate 2023-06-01
publisher MDPI AG
record_format Article
series Computers
spelling doaj.art-cc0d0d46e8674dcbb0fee7443f2a7ec32023-11-18T09:54:24ZengMDPI AGComputers2073-431X2023-06-0112612610.3390/computers12060126Arabic Sentiment Analysis Based on Word Embeddings and Deep LearningNasrin Elhassan0Giuseppe Varone1Rami Ahmed2Mandar Gogate3Kia Dashtipour4Hani Almoamari5Mohammed A. El-Affendi6Bassam Naji Al-Tamimi7Faisal Albalwy8Amir Hussain9College of Computer Sciences and Information Technology, Sudan University of Science and Technology, Khartoum P.O. Box 407, SudanDepartment of Physical Therapy, Movement and Rehabilitation Science, Northeastern University, Boston, MA 02115, USACollege of Computer Sciences and Information Technology, Sudan University of Science and Technology, Khartoum P.O. Box 407, SudanSchool of Computing, Edinburgh Napier University, Edinburgh EH10 5DT, UKSchool of Computing, Edinburgh Napier University, Edinburgh EH10 5DT, UKFaculty of Computer and Information Systems, Islamic University of Madinah, Medina 42351, Saudi ArabiaDepartment of Computer Science, College of Computer and Information Sciences, Prince Sultan University, Riyadh 12435, Saudi ArabiaSchool of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UKDepartment of Computer Science, College of Computer Science and Engineering, Taibah University, Madinah 42353, Saudi ArabiaSchool of Computing, Edinburgh Napier University, Edinburgh EH10 5DT, UKSocial media networks have grown exponentially over the last two decades, providing the opportunity for users of the internet to communicate and exchange ideas on a variety of topics. The outcome is that opinion mining plays a crucial role in analyzing user opinions and applying these to guide choices, making it one of the most popular areas of research in the field of natural language processing. Despite the fact that several languages, including English, have been the subjects of several studies, not much has been conducted in the area of the Arabic language. The morphological complexities and various dialects of the language make semantic analysis particularly challenging. Moreover, the lack of accurate pre-processing tools and limited resources are constraining factors. This novel study was motivated by the accomplishments of deep learning algorithms and word embeddings in the field of English sentiment analysis. Extensive experiments were conducted based on supervised machine learning in which word embeddings were exploited to determine the sentiment of Arabic reviews. Three deep learning algorithms, convolutional neural networks (CNNs), long short-term memory (LSTM), and a hybrid CNN-LSTM, were introduced. The models used features learned by word embeddings such as Word2Vec and fastText rather than hand-crafted features. The models were tested using two benchmark Arabic datasets: Hotel Arabic Reviews Dataset (HARD) for hotel reviews and Large-Scale Arabic Book Reviews (LARB) for book reviews, with different setups. Comparative experiments utilized the three models with two-word embeddings and different setups of the datasets. The main novelty of this study is to explore the effectiveness of using various word embeddings and different setups of benchmark datasets relating to balance, imbalance, and binary and multi-classification aspects. Findings showed that the best results were obtained in most cases when applying the fastText word embedding using the HARD 2-imbalance dataset for all three proposed models: CNN, LSTM, and CNN-LSTM. Further, the proposed CNN model outperformed the LSTM and CNN-LSTM models for the benchmark HARD dataset by achieving 94.69%, 94.63%, and 94.54% accuracy with fastText, respectively. Although the worst results were obtained for the LABR 3-imbalance dataset using both Word2Vec and FastText, they still outperformed other researchers’ state-of-the-art outcomes applying the same dataset.https://www.mdpi.com/2073-431X/12/6/126Arabic Sentiment AnalysisWord2VecFastTextconvolutional neural networkslong short-term memoryrecurrent neural networks
spellingShingle Nasrin Elhassan
Giuseppe Varone
Rami Ahmed
Mandar Gogate
Kia Dashtipour
Hani Almoamari
Mohammed A. El-Affendi
Bassam Naji Al-Tamimi
Faisal Albalwy
Amir Hussain
Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning
Computers
Arabic Sentiment Analysis
Word2Vec
FastText
convolutional neural networks
long short-term memory
recurrent neural networks
title Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning
title_full Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning
title_fullStr Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning
title_full_unstemmed Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning
title_short Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning
title_sort arabic sentiment analysis based on word embeddings and deep learning
topic Arabic Sentiment Analysis
Word2Vec
FastText
convolutional neural networks
long short-term memory
recurrent neural networks
url https://www.mdpi.com/2073-431X/12/6/126
work_keys_str_mv AT nasrinelhassan arabicsentimentanalysisbasedonwordembeddingsanddeeplearning
AT giuseppevarone arabicsentimentanalysisbasedonwordembeddingsanddeeplearning
AT ramiahmed arabicsentimentanalysisbasedonwordembeddingsanddeeplearning
AT mandargogate arabicsentimentanalysisbasedonwordembeddingsanddeeplearning
AT kiadashtipour arabicsentimentanalysisbasedonwordembeddingsanddeeplearning
AT hanialmoamari arabicsentimentanalysisbasedonwordembeddingsanddeeplearning
AT mohammedaelaffendi arabicsentimentanalysisbasedonwordembeddingsanddeeplearning
AT bassamnajialtamimi arabicsentimentanalysisbasedonwordembeddingsanddeeplearning
AT faisalalbalwy arabicsentimentanalysisbasedonwordembeddingsanddeeplearning
AT amirhussain arabicsentimentanalysisbasedonwordembeddingsanddeeplearning