AraXLNet: pre-trained language model for sentiment analysis of Arabic

Abstract The Arabic language is a complex language with little resources; therefore, its limitations create a challenge to produce accurate text classification tasks such as sentiment analysis. The main goal of sentiment analysis is to determine the overall orientation of a given text in terms of wh...

Full description

Bibliographic Details
Main Authors:	Alhanouf Alduailej, Abdulrahman Alothaim
Format:	Article
Language:	English
Published:	SpringerOpen 2022-05-01
Series:	Journal of Big Data
Subjects:	Sentiment analysis Language models NLP XLNet AraXLNet Text mining
Online Access:	https://doi.org/10.1186/s40537-022-00625-z

_version_	1818553909295185920
author	Alhanouf Alduailej Abdulrahman Alothaim
author_facet	Alhanouf Alduailej Abdulrahman Alothaim
author_sort	Alhanouf Alduailej
collection	DOAJ
description	Abstract The Arabic language is a complex language with little resources; therefore, its limitations create a challenge to produce accurate text classification tasks such as sentiment analysis. The main goal of sentiment analysis is to determine the overall orientation of a given text in terms of whether it is positive, negative, or neutral. Recently, language models have shown great results in promoting the accuracy of text classification in English. The models are pre-trained on a large dataset and then fine-tuned on the downstream tasks. Particularly, XLNet has achieved state-of-the-art results for diverse natural language processing (NLP) tasks in English. In this paper, we hypothesize that such parallel success can be achieved in Arabic. The paper aims to support this hypothesis by producing the first XLNet-based language model in Arabic called AraXLNet, demonstrating its use in Arabic sentiment analysis in order to improve the prediction accuracy of such tasks. The results showed that the proposed model, AraXLNet, with Farasa segmenter achieved an accuracy results of 94.78%, 93.01%, and 85.77% in sentiment analysis task for Arabic using multiple benchmark datasets. This result outperformed AraBERT that obtained 84.65%, 92.13%, and 85.05% on the same datasets, respectively. The improved accuracy of the proposed model was evident using multiple benchmark datasets, thus offering promising advancement in the Arabic text classification tasks.
first_indexed	2024-12-12T09:31:58Z
format	Article
id	doaj.art-827c51c563854e4b91126afbc9b4bebe
institution	Directory Open Access Journal
issn	2196-1115
language	English
last_indexed	2024-12-12T09:31:58Z
publishDate	2022-05-01
publisher	SpringerOpen
record_format	Article
series	Journal of Big Data
spelling	doaj.art-827c51c563854e4b91126afbc9b4bebe2022-12-22T00:28:50ZengSpringerOpenJournal of Big Data2196-11152022-05-019112110.1186/s40537-022-00625-zAraXLNet: pre-trained language model for sentiment analysis of ArabicAlhanouf Alduailej0Abdulrahman Alothaim1Department of Information Systems, College of Computer and Information Sciences, King Saud UniversityDepartment of Information Systems, College of Computer and Information Sciences, King Saud UniversityAbstract The Arabic language is a complex language with little resources; therefore, its limitations create a challenge to produce accurate text classification tasks such as sentiment analysis. The main goal of sentiment analysis is to determine the overall orientation of a given text in terms of whether it is positive, negative, or neutral. Recently, language models have shown great results in promoting the accuracy of text classification in English. The models are pre-trained on a large dataset and then fine-tuned on the downstream tasks. Particularly, XLNet has achieved state-of-the-art results for diverse natural language processing (NLP) tasks in English. In this paper, we hypothesize that such parallel success can be achieved in Arabic. The paper aims to support this hypothesis by producing the first XLNet-based language model in Arabic called AraXLNet, demonstrating its use in Arabic sentiment analysis in order to improve the prediction accuracy of such tasks. The results showed that the proposed model, AraXLNet, with Farasa segmenter achieved an accuracy results of 94.78%, 93.01%, and 85.77% in sentiment analysis task for Arabic using multiple benchmark datasets. This result outperformed AraBERT that obtained 84.65%, 92.13%, and 85.05% on the same datasets, respectively. The improved accuracy of the proposed model was evident using multiple benchmark datasets, thus offering promising advancement in the Arabic text classification tasks.https://doi.org/10.1186/s40537-022-00625-zSentiment analysisLanguage modelsNLPXLNetAraXLNetText mining
spellingShingle	Alhanouf Alduailej Abdulrahman Alothaim AraXLNet: pre-trained language model for sentiment analysis of Arabic Journal of Big Data Sentiment analysis Language models NLP XLNet AraXLNet Text mining
title	AraXLNet: pre-trained language model for sentiment analysis of Arabic
title_full	AraXLNet: pre-trained language model for sentiment analysis of Arabic
title_fullStr	AraXLNet: pre-trained language model for sentiment analysis of Arabic
title_full_unstemmed	AraXLNet: pre-trained language model for sentiment analysis of Arabic
title_short	AraXLNet: pre-trained language model for sentiment analysis of Arabic
title_sort	araxlnet pre trained language model for sentiment analysis of arabic
topic	Sentiment analysis Language models NLP XLNet AraXLNet Text mining
url	https://doi.org/10.1186/s40537-022-00625-z
work_keys_str_mv	AT alhanoufalduailej araxlnetpretrainedlanguagemodelforsentimentanalysisofarabic AT abdulrahmanalothaim araxlnetpretrainedlanguagemodelforsentimentanalysisofarabic

AraXLNet: pre-trained language model for sentiment analysis of Arabic

Similar Items