AraXLNet: pre-trained language model for sentiment analysis of Arabic
Abstract The Arabic language is a complex language with little resources; therefore, its limitations create a challenge to produce accurate text classification tasks such as sentiment analysis. The main goal of sentiment analysis is to determine the overall orientation of a given text in terms of wh...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2022-05-01
|
Series: | Journal of Big Data |
Subjects: | |
Online Access: | https://doi.org/10.1186/s40537-022-00625-z |
_version_ | 1818553909295185920 |
---|---|
author | Alhanouf Alduailej Abdulrahman Alothaim |
author_facet | Alhanouf Alduailej Abdulrahman Alothaim |
author_sort | Alhanouf Alduailej |
collection | DOAJ |
description | Abstract The Arabic language is a complex language with little resources; therefore, its limitations create a challenge to produce accurate text classification tasks such as sentiment analysis. The main goal of sentiment analysis is to determine the overall orientation of a given text in terms of whether it is positive, negative, or neutral. Recently, language models have shown great results in promoting the accuracy of text classification in English. The models are pre-trained on a large dataset and then fine-tuned on the downstream tasks. Particularly, XLNet has achieved state-of-the-art results for diverse natural language processing (NLP) tasks in English. In this paper, we hypothesize that such parallel success can be achieved in Arabic. The paper aims to support this hypothesis by producing the first XLNet-based language model in Arabic called AraXLNet, demonstrating its use in Arabic sentiment analysis in order to improve the prediction accuracy of such tasks. The results showed that the proposed model, AraXLNet, with Farasa segmenter achieved an accuracy results of 94.78%, 93.01%, and 85.77% in sentiment analysis task for Arabic using multiple benchmark datasets. This result outperformed AraBERT that obtained 84.65%, 92.13%, and 85.05% on the same datasets, respectively. The improved accuracy of the proposed model was evident using multiple benchmark datasets, thus offering promising advancement in the Arabic text classification tasks. |
first_indexed | 2024-12-12T09:31:58Z |
format | Article |
id | doaj.art-827c51c563854e4b91126afbc9b4bebe |
institution | Directory Open Access Journal |
issn | 2196-1115 |
language | English |
last_indexed | 2024-12-12T09:31:58Z |
publishDate | 2022-05-01 |
publisher | SpringerOpen |
record_format | Article |
series | Journal of Big Data |
spelling | doaj.art-827c51c563854e4b91126afbc9b4bebe2022-12-22T00:28:50ZengSpringerOpenJournal of Big Data2196-11152022-05-019112110.1186/s40537-022-00625-zAraXLNet: pre-trained language model for sentiment analysis of ArabicAlhanouf Alduailej0Abdulrahman Alothaim1Department of Information Systems, College of Computer and Information Sciences, King Saud UniversityDepartment of Information Systems, College of Computer and Information Sciences, King Saud UniversityAbstract The Arabic language is a complex language with little resources; therefore, its limitations create a challenge to produce accurate text classification tasks such as sentiment analysis. The main goal of sentiment analysis is to determine the overall orientation of a given text in terms of whether it is positive, negative, or neutral. Recently, language models have shown great results in promoting the accuracy of text classification in English. The models are pre-trained on a large dataset and then fine-tuned on the downstream tasks. Particularly, XLNet has achieved state-of-the-art results for diverse natural language processing (NLP) tasks in English. In this paper, we hypothesize that such parallel success can be achieved in Arabic. The paper aims to support this hypothesis by producing the first XLNet-based language model in Arabic called AraXLNet, demonstrating its use in Arabic sentiment analysis in order to improve the prediction accuracy of such tasks. The results showed that the proposed model, AraXLNet, with Farasa segmenter achieved an accuracy results of 94.78%, 93.01%, and 85.77% in sentiment analysis task for Arabic using multiple benchmark datasets. This result outperformed AraBERT that obtained 84.65%, 92.13%, and 85.05% on the same datasets, respectively. The improved accuracy of the proposed model was evident using multiple benchmark datasets, thus offering promising advancement in the Arabic text classification tasks.https://doi.org/10.1186/s40537-022-00625-zSentiment analysisLanguage modelsNLPXLNetAraXLNetText mining |
spellingShingle | Alhanouf Alduailej Abdulrahman Alothaim AraXLNet: pre-trained language model for sentiment analysis of Arabic Journal of Big Data Sentiment analysis Language models NLP XLNet AraXLNet Text mining |
title | AraXLNet: pre-trained language model for sentiment analysis of Arabic |
title_full | AraXLNet: pre-trained language model for sentiment analysis of Arabic |
title_fullStr | AraXLNet: pre-trained language model for sentiment analysis of Arabic |
title_full_unstemmed | AraXLNet: pre-trained language model for sentiment analysis of Arabic |
title_short | AraXLNet: pre-trained language model for sentiment analysis of Arabic |
title_sort | araxlnet pre trained language model for sentiment analysis of arabic |
topic | Sentiment analysis Language models NLP XLNet AraXLNet Text mining |
url | https://doi.org/10.1186/s40537-022-00625-z |
work_keys_str_mv | AT alhanoufalduailej araxlnetpretrainedlanguagemodelforsentimentanalysisofarabic AT abdulrahmanalothaim araxlnetpretrainedlanguagemodelforsentimentanalysisofarabic |