Arabic Language Opinion Mining Based on Long Short-Term Memory (LSTM)
Arabic is one of the official languages recognized by the United Nations (UN) and is widely used in the middle east, and parts of Asia, Africa, and other countries. Social media activity currently dominates the textual communication on the Internet and potentially represents people’s views about spe...
Main Authors: | , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-04-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/9/4140 |
_version_ | 1797505757323722752 |
---|---|
author | Arief Setyanto Arif Laksito Fawaz Alarfaj Mohammed Alreshoodi Kusrini Irwan Oyong Mardhiya Hayaty Abdullah Alomair Naif Almusallam Lilis Kurniasari |
author_facet | Arief Setyanto Arif Laksito Fawaz Alarfaj Mohammed Alreshoodi Kusrini Irwan Oyong Mardhiya Hayaty Abdullah Alomair Naif Almusallam Lilis Kurniasari |
author_sort | Arief Setyanto |
collection | DOAJ |
description | Arabic is one of the official languages recognized by the United Nations (UN) and is widely used in the middle east, and parts of Asia, Africa, and other countries. Social media activity currently dominates the textual communication on the Internet and potentially represents people’s views about specific issues. Opinion mining is an important task for understanding public opinion polarity towards an issue. Understanding public opinion leads to better decisions in many fields, such as public services and business. Language background plays a vital role in understanding opinion polarity. Variation is not only due to the vocabulary but also cultural background. The sentence is a time series signal; therefore, sequence gives a significant correlation to the meaning of the text. A recurrent neural network (RNN) is a variant of deep learning where the sequence is considered. Long short-term memory (LSTM) is an implementation of RNN with a particular gate to keep or ignore specific word signals during a sequence of inputs. Text is unstructured data, and it cannot be processed further by a machine unless an algorithm transforms the representation into a readable machine learning format as a vector of numerical values. Transformation algorithms range from the Term Frequency–Inverse Document Frequency (TF-IDF) transform to advanced word embedding. Word embedding methods include GloVe, word2vec, BERT, and fastText. This research experimented with those algorithms to perform vector transformation of the Arabic text dataset. This study implements and compares the GloVe and fastText word embedding algorithms and long short-term memory (LSTM) implemented in single-, double-, and triple-layer architectures. Finally, this research compares their accuracy for opinion mining on an Arabic dataset. It evaluates the proposed algorithm with the ASAD dataset of 55,000 annotated tweets in three classes. The dataset was augmented to achieve equal proportions of positive, negative, and neutral classes. According to the evaluation results, the triple-layer LSTM with fastText word embedding achieved the best testing accuracy, at 90.9%, surpassing all other experimental scenarios. |
first_indexed | 2024-03-10T04:22:46Z |
format | Article |
id | doaj.art-4e7f92c630214b27a452f943c1ac579a |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-10T04:22:46Z |
publishDate | 2022-04-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-4e7f92c630214b27a452f943c1ac579a2023-11-23T07:44:51ZengMDPI AGApplied Sciences2076-34172022-04-01129414010.3390/app12094140Arabic Language Opinion Mining Based on Long Short-Term Memory (LSTM)Arief Setyanto0Arif Laksito1Fawaz Alarfaj2Mohammed Alreshoodi3Kusrini4Irwan Oyong5Mardhiya Hayaty6Abdullah Alomair7Naif Almusallam8Lilis Kurniasari9Magister of Informatics Engineering, Universitas Amikom Yogyakarta, Yogyakarta 55281, IndonesiaFaculty of Computer Science, Universitas Amikom Yogyakarta, Yogyakarta 55281, IndonesiaDepartment of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University, Riyadh 11564, Saudi ArabiaDepartment of Natural Applied Science, Applied College, Qassim University, Buraydah 52571, Saudi ArabiaMagister of Informatics Engineering, Universitas Amikom Yogyakarta, Yogyakarta 55281, IndonesiaFaculty of Computer Science, Universitas Amikom Yogyakarta, Yogyakarta 55281, IndonesiaFaculty of Computer Science, Universitas Amikom Yogyakarta, Yogyakarta 55281, IndonesiaDepartment of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University, Riyadh 11564, Saudi ArabiaDepartment of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University, Riyadh 11564, Saudi ArabiaDepartemen of Electrical Engineering, Universitas Nahdlatul Ulama Yogyakarta, Yogyakarta 55162, IndonesiaArabic is one of the official languages recognized by the United Nations (UN) and is widely used in the middle east, and parts of Asia, Africa, and other countries. Social media activity currently dominates the textual communication on the Internet and potentially represents people’s views about specific issues. Opinion mining is an important task for understanding public opinion polarity towards an issue. Understanding public opinion leads to better decisions in many fields, such as public services and business. Language background plays a vital role in understanding opinion polarity. Variation is not only due to the vocabulary but also cultural background. The sentence is a time series signal; therefore, sequence gives a significant correlation to the meaning of the text. A recurrent neural network (RNN) is a variant of deep learning where the sequence is considered. Long short-term memory (LSTM) is an implementation of RNN with a particular gate to keep or ignore specific word signals during a sequence of inputs. Text is unstructured data, and it cannot be processed further by a machine unless an algorithm transforms the representation into a readable machine learning format as a vector of numerical values. Transformation algorithms range from the Term Frequency–Inverse Document Frequency (TF-IDF) transform to advanced word embedding. Word embedding methods include GloVe, word2vec, BERT, and fastText. This research experimented with those algorithms to perform vector transformation of the Arabic text dataset. This study implements and compares the GloVe and fastText word embedding algorithms and long short-term memory (LSTM) implemented in single-, double-, and triple-layer architectures. Finally, this research compares their accuracy for opinion mining on an Arabic dataset. It evaluates the proposed algorithm with the ASAD dataset of 55,000 annotated tweets in three classes. The dataset was augmented to achieve equal proportions of positive, negative, and neutral classes. According to the evaluation results, the triple-layer LSTM with fastText word embedding achieved the best testing accuracy, at 90.9%, surpassing all other experimental scenarios.https://www.mdpi.com/2076-3417/12/9/4140sentiment analysisopinion miningNeural NetworkLSTMArabic |
spellingShingle | Arief Setyanto Arif Laksito Fawaz Alarfaj Mohammed Alreshoodi Kusrini Irwan Oyong Mardhiya Hayaty Abdullah Alomair Naif Almusallam Lilis Kurniasari Arabic Language Opinion Mining Based on Long Short-Term Memory (LSTM) Applied Sciences sentiment analysis opinion mining Neural Network LSTM Arabic |
title | Arabic Language Opinion Mining Based on Long Short-Term Memory (LSTM) |
title_full | Arabic Language Opinion Mining Based on Long Short-Term Memory (LSTM) |
title_fullStr | Arabic Language Opinion Mining Based on Long Short-Term Memory (LSTM) |
title_full_unstemmed | Arabic Language Opinion Mining Based on Long Short-Term Memory (LSTM) |
title_short | Arabic Language Opinion Mining Based on Long Short-Term Memory (LSTM) |
title_sort | arabic language opinion mining based on long short term memory lstm |
topic | sentiment analysis opinion mining Neural Network LSTM Arabic |
url | https://www.mdpi.com/2076-3417/12/9/4140 |
work_keys_str_mv | AT ariefsetyanto arabiclanguageopinionminingbasedonlongshorttermmemorylstm AT ariflaksito arabiclanguageopinionminingbasedonlongshorttermmemorylstm AT fawazalarfaj arabiclanguageopinionminingbasedonlongshorttermmemorylstm AT mohammedalreshoodi arabiclanguageopinionminingbasedonlongshorttermmemorylstm AT kusrini arabiclanguageopinionminingbasedonlongshorttermmemorylstm AT irwanoyong arabiclanguageopinionminingbasedonlongshorttermmemorylstm AT mardhiyahayaty arabiclanguageopinionminingbasedonlongshorttermmemorylstm AT abdullahalomair arabiclanguageopinionminingbasedonlongshorttermmemorylstm AT naifalmusallam arabiclanguageopinionminingbasedonlongshorttermmemorylstm AT liliskurniasari arabiclanguageopinionminingbasedonlongshorttermmemorylstm |