Arabic Language Opinion Mining Based on Long Short-Term Memory (LSTM)

Arabic is one of the official languages recognized by the United Nations (UN) and is widely used in the middle east, and parts of Asia, Africa, and other countries. Social media activity currently dominates the textual communication on the Internet and potentially represents people’s views about spe...

Full description

Bibliographic Details
Main Authors: Arief Setyanto, Arif Laksito, Fawaz Alarfaj, Mohammed Alreshoodi, Kusrini, Irwan Oyong, Mardhiya Hayaty, Abdullah Alomair, Naif Almusallam, Lilis Kurniasari
Format: Article
Language:English
Published: MDPI AG 2022-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/9/4140
_version_ 1797505757323722752
author Arief Setyanto
Arif Laksito
Fawaz Alarfaj
Mohammed Alreshoodi
Kusrini
Irwan Oyong
Mardhiya Hayaty
Abdullah Alomair
Naif Almusallam
Lilis Kurniasari
author_facet Arief Setyanto
Arif Laksito
Fawaz Alarfaj
Mohammed Alreshoodi
Kusrini
Irwan Oyong
Mardhiya Hayaty
Abdullah Alomair
Naif Almusallam
Lilis Kurniasari
author_sort Arief Setyanto
collection DOAJ
description Arabic is one of the official languages recognized by the United Nations (UN) and is widely used in the middle east, and parts of Asia, Africa, and other countries. Social media activity currently dominates the textual communication on the Internet and potentially represents people’s views about specific issues. Opinion mining is an important task for understanding public opinion polarity towards an issue. Understanding public opinion leads to better decisions in many fields, such as public services and business. Language background plays a vital role in understanding opinion polarity. Variation is not only due to the vocabulary but also cultural background. The sentence is a time series signal; therefore, sequence gives a significant correlation to the meaning of the text. A recurrent neural network (RNN) is a variant of deep learning where the sequence is considered. Long short-term memory (LSTM) is an implementation of RNN with a particular gate to keep or ignore specific word signals during a sequence of inputs. Text is unstructured data, and it cannot be processed further by a machine unless an algorithm transforms the representation into a readable machine learning format as a vector of numerical values. Transformation algorithms range from the Term Frequency–Inverse Document Frequency (TF-IDF) transform to advanced word embedding. Word embedding methods include GloVe, word2vec, BERT, and fastText. This research experimented with those algorithms to perform vector transformation of the Arabic text dataset. This study implements and compares the GloVe and fastText word embedding algorithms and long short-term memory (LSTM) implemented in single-, double-, and triple-layer architectures. Finally, this research compares their accuracy for opinion mining on an Arabic dataset. It evaluates the proposed algorithm with the ASAD dataset of 55,000 annotated tweets in three classes. The dataset was augmented to achieve equal proportions of positive, negative, and neutral classes. According to the evaluation results, the triple-layer LSTM with fastText word embedding achieved the best testing accuracy, at 90.9%, surpassing all other experimental scenarios.
first_indexed 2024-03-10T04:22:46Z
format Article
id doaj.art-4e7f92c630214b27a452f943c1ac579a
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T04:22:46Z
publishDate 2022-04-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-4e7f92c630214b27a452f943c1ac579a2023-11-23T07:44:51ZengMDPI AGApplied Sciences2076-34172022-04-01129414010.3390/app12094140Arabic Language Opinion Mining Based on Long Short-Term Memory (LSTM)Arief Setyanto0Arif Laksito1Fawaz Alarfaj2Mohammed Alreshoodi3Kusrini4Irwan Oyong5Mardhiya Hayaty6Abdullah Alomair7Naif Almusallam8Lilis Kurniasari9Magister of Informatics Engineering, Universitas Amikom Yogyakarta, Yogyakarta 55281, IndonesiaFaculty of Computer Science, Universitas Amikom Yogyakarta, Yogyakarta 55281, IndonesiaDepartment of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University, Riyadh 11564, Saudi ArabiaDepartment of Natural Applied Science, Applied College, Qassim University, Buraydah 52571, Saudi ArabiaMagister of Informatics Engineering, Universitas Amikom Yogyakarta, Yogyakarta 55281, IndonesiaFaculty of Computer Science, Universitas Amikom Yogyakarta, Yogyakarta 55281, IndonesiaFaculty of Computer Science, Universitas Amikom Yogyakarta, Yogyakarta 55281, IndonesiaDepartment of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University, Riyadh 11564, Saudi ArabiaDepartment of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University, Riyadh 11564, Saudi ArabiaDepartemen of Electrical Engineering, Universitas Nahdlatul Ulama Yogyakarta, Yogyakarta 55162, IndonesiaArabic is one of the official languages recognized by the United Nations (UN) and is widely used in the middle east, and parts of Asia, Africa, and other countries. Social media activity currently dominates the textual communication on the Internet and potentially represents people’s views about specific issues. Opinion mining is an important task for understanding public opinion polarity towards an issue. Understanding public opinion leads to better decisions in many fields, such as public services and business. Language background plays a vital role in understanding opinion polarity. Variation is not only due to the vocabulary but also cultural background. The sentence is a time series signal; therefore, sequence gives a significant correlation to the meaning of the text. A recurrent neural network (RNN) is a variant of deep learning where the sequence is considered. Long short-term memory (LSTM) is an implementation of RNN with a particular gate to keep or ignore specific word signals during a sequence of inputs. Text is unstructured data, and it cannot be processed further by a machine unless an algorithm transforms the representation into a readable machine learning format as a vector of numerical values. Transformation algorithms range from the Term Frequency–Inverse Document Frequency (TF-IDF) transform to advanced word embedding. Word embedding methods include GloVe, word2vec, BERT, and fastText. This research experimented with those algorithms to perform vector transformation of the Arabic text dataset. This study implements and compares the GloVe and fastText word embedding algorithms and long short-term memory (LSTM) implemented in single-, double-, and triple-layer architectures. Finally, this research compares their accuracy for opinion mining on an Arabic dataset. It evaluates the proposed algorithm with the ASAD dataset of 55,000 annotated tweets in three classes. The dataset was augmented to achieve equal proportions of positive, negative, and neutral classes. According to the evaluation results, the triple-layer LSTM with fastText word embedding achieved the best testing accuracy, at 90.9%, surpassing all other experimental scenarios.https://www.mdpi.com/2076-3417/12/9/4140sentiment analysisopinion miningNeural NetworkLSTMArabic
spellingShingle Arief Setyanto
Arif Laksito
Fawaz Alarfaj
Mohammed Alreshoodi
Kusrini
Irwan Oyong
Mardhiya Hayaty
Abdullah Alomair
Naif Almusallam
Lilis Kurniasari
Arabic Language Opinion Mining Based on Long Short-Term Memory (LSTM)
Applied Sciences
sentiment analysis
opinion mining
Neural Network
LSTM
Arabic
title Arabic Language Opinion Mining Based on Long Short-Term Memory (LSTM)
title_full Arabic Language Opinion Mining Based on Long Short-Term Memory (LSTM)
title_fullStr Arabic Language Opinion Mining Based on Long Short-Term Memory (LSTM)
title_full_unstemmed Arabic Language Opinion Mining Based on Long Short-Term Memory (LSTM)
title_short Arabic Language Opinion Mining Based on Long Short-Term Memory (LSTM)
title_sort arabic language opinion mining based on long short term memory lstm
topic sentiment analysis
opinion mining
Neural Network
LSTM
Arabic
url https://www.mdpi.com/2076-3417/12/9/4140
work_keys_str_mv AT ariefsetyanto arabiclanguageopinionminingbasedonlongshorttermmemorylstm
AT ariflaksito arabiclanguageopinionminingbasedonlongshorttermmemorylstm
AT fawazalarfaj arabiclanguageopinionminingbasedonlongshorttermmemorylstm
AT mohammedalreshoodi arabiclanguageopinionminingbasedonlongshorttermmemorylstm
AT kusrini arabiclanguageopinionminingbasedonlongshorttermmemorylstm
AT irwanoyong arabiclanguageopinionminingbasedonlongshorttermmemorylstm
AT mardhiyahayaty arabiclanguageopinionminingbasedonlongshorttermmemorylstm
AT abdullahalomair arabiclanguageopinionminingbasedonlongshorttermmemorylstm
AT naifalmusallam arabiclanguageopinionminingbasedonlongshorttermmemorylstm
AT liliskurniasari arabiclanguageopinionminingbasedonlongshorttermmemorylstm