The Effectiveness of Arabic Stemmers Using Arabized Word Removal

<p>Other languages have influenced Arabic because of several factors, such as geographical nearness, trade communication, past Islamic conquests, science and technology, new devices, brand names, models, and fashion. As a result of these factors, foreign words are used in Arabic text and are k...

Full description

Bibliographic Details
Main Authors: Hamood ALshalabi, Sabrina Tiun, Nazlia Omar, Kamal Ali Alezabi, Fatima N. AL-Aswadi
Format: Article
Language:English
Published: Regional Information Center for Science and Technology (RICeST) 2022-10-01
Series:International Journal of Information Science and Management
Subjects:
Online Access:https://ijism.ricest.ac.ir/index.php/ijism/article/view/2151
_version_ 1798030095833628672
author Hamood ALshalabi
Sabrina Tiun
Nazlia Omar
Kamal Ali Alezabi
Fatima N. AL-Aswadi
author_facet Hamood ALshalabi
Sabrina Tiun
Nazlia Omar
Kamal Ali Alezabi
Fatima N. AL-Aswadi
author_sort Hamood ALshalabi
collection DOAJ
description <p>Other languages have influenced Arabic because of several factors, such as geographical nearness, trade communication, past Islamic conquests, science and technology, new devices, brand names, models, and fashion. As a result of these factors, foreign words are used in Arabic text and are known as Arabised words. Arabised words affect the Arabic natural language processing (NLP) task because identifying a correct stem or root from an Arabic word becomes more difficult. Therefore, a more efficient Arabic NLP can be developed if Arabised word removal is part of a pre-processing task. In this paper, we propose an algorithm for detecting and extracting Arabised words as a pre-processing task for an Arabic stemming task. This algorithm is a combination of lexicon-based and rule-based approaches. The lexicon list has been developed based on various sources of Arabic text resources, and the rule-based algorithm has been designed to cater to Arabised words with definite articles and use pattern matching on prefixes and suffixes. To evaluate the effectiveness of the proposed Arabised word removal algorithm on the Arabic NLP task, we use Arabised word removal as part of pre-processing in Arabic stemmers. Three Arabic stemmers are used in our evaluation, namely, light stemming, condition light and ARLS, on three types of Arabic standard datasets. Comparisons were made by measuring the performance of precision, recall and IFC on the stemmers with or without our Arabised word removal pre-processing. Results show that the performance on all the stemmers improves if Arabised word removal is included as part of the stemming's pre-processing. Therefore, an efficient Arabic NLP application or task can be developed if Arabised word removal is included in the pre-processing stage for Arabic NLP application, mainly Arabic stemming.</p><p>https://dorl.net/dor/20.1001.1.20088302.2022.20.4.6.5</p>
first_indexed 2024-04-11T19:34:42Z
format Article
id doaj.art-00ff681fd28543b7bf6488e014ad47fb
institution Directory Open Access Journal
issn 2008-8302
2008-8310
language English
last_indexed 2024-04-11T19:34:42Z
publishDate 2022-10-01
publisher Regional Information Center for Science and Technology (RICeST)
record_format Article
series International Journal of Information Science and Management
spelling doaj.art-00ff681fd28543b7bf6488e014ad47fb2022-12-22T04:06:53ZengRegional Information Center for Science and Technology (RICeST)International Journal of Information Science and Management2008-83022008-83102022-10-0120487102452The Effectiveness of Arabic Stemmers Using Arabized Word RemovalHamood ALshalabi0Sabrina TiunNazlia OmarKamal Ali AlezabiFatima N. AL-AswadiCAIT, Faculty of Information Science & Technology, Universiti Kebangsaan Malaysia, 43600 UKM, Bangi Selangor.<p>Other languages have influenced Arabic because of several factors, such as geographical nearness, trade communication, past Islamic conquests, science and technology, new devices, brand names, models, and fashion. As a result of these factors, foreign words are used in Arabic text and are known as Arabised words. Arabised words affect the Arabic natural language processing (NLP) task because identifying a correct stem or root from an Arabic word becomes more difficult. Therefore, a more efficient Arabic NLP can be developed if Arabised word removal is part of a pre-processing task. In this paper, we propose an algorithm for detecting and extracting Arabised words as a pre-processing task for an Arabic stemming task. This algorithm is a combination of lexicon-based and rule-based approaches. The lexicon list has been developed based on various sources of Arabic text resources, and the rule-based algorithm has been designed to cater to Arabised words with definite articles and use pattern matching on prefixes and suffixes. To evaluate the effectiveness of the proposed Arabised word removal algorithm on the Arabic NLP task, we use Arabised word removal as part of pre-processing in Arabic stemmers. Three Arabic stemmers are used in our evaluation, namely, light stemming, condition light and ARLS, on three types of Arabic standard datasets. Comparisons were made by measuring the performance of precision, recall and IFC on the stemmers with or without our Arabised word removal pre-processing. Results show that the performance on all the stemmers improves if Arabised word removal is included as part of the stemming's pre-processing. Therefore, an efficient Arabic NLP application or task can be developed if Arabised word removal is included in the pre-processing stage for Arabic NLP application, mainly Arabic stemming.</p><p>https://dorl.net/dor/20.1001.1.20088302.2022.20.4.6.5</p>https://ijism.ricest.ac.ir/index.php/ijism/article/view/2151arabised word, natural language processing, arabised words removal, arabic text pre-processing, arabic stemming, text processing, arabic language.
spellingShingle Hamood ALshalabi
Sabrina Tiun
Nazlia Omar
Kamal Ali Alezabi
Fatima N. AL-Aswadi
The Effectiveness of Arabic Stemmers Using Arabized Word Removal
International Journal of Information Science and Management
arabised word, natural language processing, arabised words removal, arabic text pre-processing, arabic stemming, text processing, arabic language.
title The Effectiveness of Arabic Stemmers Using Arabized Word Removal
title_full The Effectiveness of Arabic Stemmers Using Arabized Word Removal
title_fullStr The Effectiveness of Arabic Stemmers Using Arabized Word Removal
title_full_unstemmed The Effectiveness of Arabic Stemmers Using Arabized Word Removal
title_short The Effectiveness of Arabic Stemmers Using Arabized Word Removal
title_sort effectiveness of arabic stemmers using arabized word removal
topic arabised word, natural language processing, arabised words removal, arabic text pre-processing, arabic stemming, text processing, arabic language.
url https://ijism.ricest.ac.ir/index.php/ijism/article/view/2151
work_keys_str_mv AT hamoodalshalabi theeffectivenessofarabicstemmersusingarabizedwordremoval
AT sabrinatiun theeffectivenessofarabicstemmersusingarabizedwordremoval
AT nazliaomar theeffectivenessofarabicstemmersusingarabizedwordremoval
AT kamalalialezabi theeffectivenessofarabicstemmersusingarabizedwordremoval
AT fatimanalaswadi theeffectivenessofarabicstemmersusingarabizedwordremoval
AT hamoodalshalabi effectivenessofarabicstemmersusingarabizedwordremoval
AT sabrinatiun effectivenessofarabicstemmersusingarabizedwordremoval
AT nazliaomar effectivenessofarabicstemmersusingarabizedwordremoval
AT kamalalialezabi effectivenessofarabicstemmersusingarabizedwordremoval
AT fatimanalaswadi effectivenessofarabicstemmersusingarabizedwordremoval