AraXLNet: pre-trained language model for sentiment analysis of Arabic

Abstract The Arabic language is a complex language with little resources; therefore, its limitations create a challenge to produce accurate text classification tasks such as sentiment analysis. The main goal of sentiment analysis is to determine the overall orientation of a given text in terms of wh...

Full description

Bibliographic Details
Main Authors: Alhanouf Alduailej, Abdulrahman Alothaim
Format: Article
Language:English
Published: SpringerOpen 2022-05-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-022-00625-z
_version_ 1818553909295185920
author Alhanouf Alduailej
Abdulrahman Alothaim
author_facet Alhanouf Alduailej
Abdulrahman Alothaim
author_sort Alhanouf Alduailej
collection DOAJ
description Abstract The Arabic language is a complex language with little resources; therefore, its limitations create a challenge to produce accurate text classification tasks such as sentiment analysis. The main goal of sentiment analysis is to determine the overall orientation of a given text in terms of whether it is positive, negative, or neutral. Recently, language models have shown great results in promoting the accuracy of text classification in English. The models are pre-trained on a large dataset and then fine-tuned on the downstream tasks. Particularly, XLNet has achieved state-of-the-art results for diverse natural language processing (NLP) tasks in English. In this paper, we hypothesize that such parallel success can be achieved in Arabic. The paper aims to support this hypothesis by producing the first XLNet-based language model in Arabic called AraXLNet, demonstrating its use in Arabic sentiment analysis in order to improve the prediction accuracy of such tasks. The results showed that the proposed model, AraXLNet, with Farasa segmenter achieved an accuracy results of 94.78%, 93.01%, and 85.77% in sentiment analysis task for Arabic using multiple benchmark datasets. This result outperformed AraBERT that obtained 84.65%, 92.13%, and 85.05% on the same datasets, respectively. The improved accuracy of the proposed model was evident using multiple benchmark datasets, thus offering promising advancement in the Arabic text classification tasks.
first_indexed 2024-12-12T09:31:58Z
format Article
id doaj.art-827c51c563854e4b91126afbc9b4bebe
institution Directory Open Access Journal
issn 2196-1115
language English
last_indexed 2024-12-12T09:31:58Z
publishDate 2022-05-01
publisher SpringerOpen
record_format Article
series Journal of Big Data
spelling doaj.art-827c51c563854e4b91126afbc9b4bebe2022-12-22T00:28:50ZengSpringerOpenJournal of Big Data2196-11152022-05-019112110.1186/s40537-022-00625-zAraXLNet: pre-trained language model for sentiment analysis of ArabicAlhanouf Alduailej0Abdulrahman Alothaim1Department of Information Systems, College of Computer and Information Sciences, King Saud UniversityDepartment of Information Systems, College of Computer and Information Sciences, King Saud UniversityAbstract The Arabic language is a complex language with little resources; therefore, its limitations create a challenge to produce accurate text classification tasks such as sentiment analysis. The main goal of sentiment analysis is to determine the overall orientation of a given text in terms of whether it is positive, negative, or neutral. Recently, language models have shown great results in promoting the accuracy of text classification in English. The models are pre-trained on a large dataset and then fine-tuned on the downstream tasks. Particularly, XLNet has achieved state-of-the-art results for diverse natural language processing (NLP) tasks in English. In this paper, we hypothesize that such parallel success can be achieved in Arabic. The paper aims to support this hypothesis by producing the first XLNet-based language model in Arabic called AraXLNet, demonstrating its use in Arabic sentiment analysis in order to improve the prediction accuracy of such tasks. The results showed that the proposed model, AraXLNet, with Farasa segmenter achieved an accuracy results of 94.78%, 93.01%, and 85.77% in sentiment analysis task for Arabic using multiple benchmark datasets. This result outperformed AraBERT that obtained 84.65%, 92.13%, and 85.05% on the same datasets, respectively. The improved accuracy of the proposed model was evident using multiple benchmark datasets, thus offering promising advancement in the Arabic text classification tasks.https://doi.org/10.1186/s40537-022-00625-zSentiment analysisLanguage modelsNLPXLNetAraXLNetText mining
spellingShingle Alhanouf Alduailej
Abdulrahman Alothaim
AraXLNet: pre-trained language model for sentiment analysis of Arabic
Journal of Big Data
Sentiment analysis
Language models
NLP
XLNet
AraXLNet
Text mining
title AraXLNet: pre-trained language model for sentiment analysis of Arabic
title_full AraXLNet: pre-trained language model for sentiment analysis of Arabic
title_fullStr AraXLNet: pre-trained language model for sentiment analysis of Arabic
title_full_unstemmed AraXLNet: pre-trained language model for sentiment analysis of Arabic
title_short AraXLNet: pre-trained language model for sentiment analysis of Arabic
title_sort araxlnet pre trained language model for sentiment analysis of arabic
topic Sentiment analysis
Language models
NLP
XLNet
AraXLNet
Text mining
url https://doi.org/10.1186/s40537-022-00625-z
work_keys_str_mv AT alhanoufalduailej araxlnetpretrainedlanguagemodelforsentimentanalysisofarabic
AT abdulrahmanalothaim araxlnetpretrainedlanguagemodelforsentimentanalysisofarabic