The Effectiveness of Arabic Stemmers Using Arabized Word Removal
<p>Other languages have influenced Arabic because of several factors, such as geographical nearness, trade communication, past Islamic conquests, science and technology, new devices, brand names, models, and fashion. As a result of these factors, foreign words are used in Arabic text and are k...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Regional Information Center for Science and Technology (RICeST)
2022-10-01
|
Series: | International Journal of Information Science and Management |
Subjects: | |
Online Access: | https://ijism.ricest.ac.ir/index.php/ijism/article/view/2151 |
_version_ | 1798030095833628672 |
---|---|
author | Hamood ALshalabi Sabrina Tiun Nazlia Omar Kamal Ali Alezabi Fatima N. AL-Aswadi |
author_facet | Hamood ALshalabi Sabrina Tiun Nazlia Omar Kamal Ali Alezabi Fatima N. AL-Aswadi |
author_sort | Hamood ALshalabi |
collection | DOAJ |
description | <p>Other languages have influenced Arabic because of several factors, such as geographical nearness, trade communication, past Islamic conquests, science and technology, new devices, brand names, models, and fashion. As a result of these factors, foreign words are used in Arabic text and are known as Arabised words. Arabised words affect the Arabic natural language processing (NLP) task because identifying a correct stem or root from an Arabic word becomes more difficult. Therefore, a more efficient Arabic NLP can be developed if Arabised word removal is part of a pre-processing task. In this paper, we propose an algorithm for detecting and extracting Arabised words as a pre-processing task for an Arabic stemming task. This algorithm is a combination of lexicon-based and rule-based approaches. The lexicon list has been developed based on various sources of Arabic text resources, and the rule-based algorithm has been designed to cater to Arabised words with definite articles and use pattern matching on prefixes and suffixes. To evaluate the effectiveness of the proposed Arabised word removal algorithm on the Arabic NLP task, we use Arabised word removal as part of pre-processing in Arabic stemmers. Three Arabic stemmers are used in our evaluation, namely, light stemming, condition light and ARLS, on three types of Arabic standard datasets. Comparisons were made by measuring the performance of precision, recall and IFC on the stemmers with or without our Arabised word removal pre-processing. Results show that the performance on all the stemmers improves if Arabised word removal is included as part of the stemming's pre-processing. Therefore, an efficient Arabic NLP application or task can be developed if Arabised word removal is included in the pre-processing stage for Arabic NLP application, mainly Arabic stemming.</p><p>https://dorl.net/dor/20.1001.1.20088302.2022.20.4.6.5</p> |
first_indexed | 2024-04-11T19:34:42Z |
format | Article |
id | doaj.art-00ff681fd28543b7bf6488e014ad47fb |
institution | Directory Open Access Journal |
issn | 2008-8302 2008-8310 |
language | English |
last_indexed | 2024-04-11T19:34:42Z |
publishDate | 2022-10-01 |
publisher | Regional Information Center for Science and Technology (RICeST) |
record_format | Article |
series | International Journal of Information Science and Management |
spelling | doaj.art-00ff681fd28543b7bf6488e014ad47fb2022-12-22T04:06:53ZengRegional Information Center for Science and Technology (RICeST)International Journal of Information Science and Management2008-83022008-83102022-10-0120487102452The Effectiveness of Arabic Stemmers Using Arabized Word RemovalHamood ALshalabi0Sabrina TiunNazlia OmarKamal Ali AlezabiFatima N. AL-AswadiCAIT, Faculty of Information Science & Technology, Universiti Kebangsaan Malaysia, 43600 UKM, Bangi Selangor.<p>Other languages have influenced Arabic because of several factors, such as geographical nearness, trade communication, past Islamic conquests, science and technology, new devices, brand names, models, and fashion. As a result of these factors, foreign words are used in Arabic text and are known as Arabised words. Arabised words affect the Arabic natural language processing (NLP) task because identifying a correct stem or root from an Arabic word becomes more difficult. Therefore, a more efficient Arabic NLP can be developed if Arabised word removal is part of a pre-processing task. In this paper, we propose an algorithm for detecting and extracting Arabised words as a pre-processing task for an Arabic stemming task. This algorithm is a combination of lexicon-based and rule-based approaches. The lexicon list has been developed based on various sources of Arabic text resources, and the rule-based algorithm has been designed to cater to Arabised words with definite articles and use pattern matching on prefixes and suffixes. To evaluate the effectiveness of the proposed Arabised word removal algorithm on the Arabic NLP task, we use Arabised word removal as part of pre-processing in Arabic stemmers. Three Arabic stemmers are used in our evaluation, namely, light stemming, condition light and ARLS, on three types of Arabic standard datasets. Comparisons were made by measuring the performance of precision, recall and IFC on the stemmers with or without our Arabised word removal pre-processing. Results show that the performance on all the stemmers improves if Arabised word removal is included as part of the stemming's pre-processing. Therefore, an efficient Arabic NLP application or task can be developed if Arabised word removal is included in the pre-processing stage for Arabic NLP application, mainly Arabic stemming.</p><p>https://dorl.net/dor/20.1001.1.20088302.2022.20.4.6.5</p>https://ijism.ricest.ac.ir/index.php/ijism/article/view/2151arabised word, natural language processing, arabised words removal, arabic text pre-processing, arabic stemming, text processing, arabic language. |
spellingShingle | Hamood ALshalabi Sabrina Tiun Nazlia Omar Kamal Ali Alezabi Fatima N. AL-Aswadi The Effectiveness of Arabic Stemmers Using Arabized Word Removal International Journal of Information Science and Management arabised word, natural language processing, arabised words removal, arabic text pre-processing, arabic stemming, text processing, arabic language. |
title | The Effectiveness of Arabic Stemmers Using Arabized Word Removal |
title_full | The Effectiveness of Arabic Stemmers Using Arabized Word Removal |
title_fullStr | The Effectiveness of Arabic Stemmers Using Arabized Word Removal |
title_full_unstemmed | The Effectiveness of Arabic Stemmers Using Arabized Word Removal |
title_short | The Effectiveness of Arabic Stemmers Using Arabized Word Removal |
title_sort | effectiveness of arabic stemmers using arabized word removal |
topic | arabised word, natural language processing, arabised words removal, arabic text pre-processing, arabic stemming, text processing, arabic language. |
url | https://ijism.ricest.ac.ir/index.php/ijism/article/view/2151 |
work_keys_str_mv | AT hamoodalshalabi theeffectivenessofarabicstemmersusingarabizedwordremoval AT sabrinatiun theeffectivenessofarabicstemmersusingarabizedwordremoval AT nazliaomar theeffectivenessofarabicstemmersusingarabizedwordremoval AT kamalalialezabi theeffectivenessofarabicstemmersusingarabizedwordremoval AT fatimanalaswadi theeffectivenessofarabicstemmersusingarabizedwordremoval AT hamoodalshalabi effectivenessofarabicstemmersusingarabizedwordremoval AT sabrinatiun effectivenessofarabicstemmersusingarabizedwordremoval AT nazliaomar effectivenessofarabicstemmersusingarabizedwordremoval AT kamalalialezabi effectivenessofarabicstemmersusingarabizedwordremoval AT fatimanalaswadi effectivenessofarabicstemmersusingarabizedwordremoval |