A rule-based stemmer for Arabic Gulf dialect

Arabic dialects arewidely used from many years ago instead of Modern Standard Arabic language in many fields. The presence of dialects in any language is a big challenge. Dialects add a new set of variational dimensions in some fields like natural language processing, information retrieval and even...

Full description

Bibliographic Details
Main Authors: Belal Abuata, Asma Al-Omari
Format: Article
Language:English
Published: Elsevier 2015-04-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1319157815000191
_version_ 1818909214825775104
author Belal Abuata
Asma Al-Omari
author_facet Belal Abuata
Asma Al-Omari
author_sort Belal Abuata
collection DOAJ
description Arabic dialects arewidely used from many years ago instead of Modern Standard Arabic language in many fields. The presence of dialects in any language is a big challenge. Dialects add a new set of variational dimensions in some fields like natural language processing, information retrieval and even in Arabic chatting between different Arab nationals. Spoken dialects have no standard morphological, phonological and lexical like Modern Standard Arabic. Hence, the objective of this paper is to describe a procedure or algorithm by which a stem for the Arabian Gulf dialect can be defined. The algorithm is rule based. Special rules are created to remove the suffixes and prefixes of the dialect words. Also, the algorithm applies rules related to the word size and the relation between adjacent letters. The algorithm was tested for a number of words and given a good correct stem ratio. The algorithm is also compared with two Modern Standard Arabic algorithms. The results showed that Modern Standard Arabic stemmers performed poorly with Arabic Gulf dialect and our algorithm performed poorly when applied for Modern Standard Arabic words.
first_indexed 2024-12-19T22:23:22Z
format Article
id doaj.art-011217dcd71043599e8715cb36eed1ab
institution Directory Open Access Journal
issn 1319-1578
language English
last_indexed 2024-12-19T22:23:22Z
publishDate 2015-04-01
publisher Elsevier
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj.art-011217dcd71043599e8715cb36eed1ab2022-12-21T20:03:34ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782015-04-0127210411210.1016/j.jksuci.2014.04.003A rule-based stemmer for Arabic Gulf dialectBelal AbuataAsma Al-OmariArabic dialects arewidely used from many years ago instead of Modern Standard Arabic language in many fields. The presence of dialects in any language is a big challenge. Dialects add a new set of variational dimensions in some fields like natural language processing, information retrieval and even in Arabic chatting between different Arab nationals. Spoken dialects have no standard morphological, phonological and lexical like Modern Standard Arabic. Hence, the objective of this paper is to describe a procedure or algorithm by which a stem for the Arabian Gulf dialect can be defined. The algorithm is rule based. Special rules are created to remove the suffixes and prefixes of the dialect words. Also, the algorithm applies rules related to the word size and the relation between adjacent letters. The algorithm was tested for a number of words and given a good correct stem ratio. The algorithm is also compared with two Modern Standard Arabic algorithms. The results showed that Modern Standard Arabic stemmers performed poorly with Arabic Gulf dialect and our algorithm performed poorly when applied for Modern Standard Arabic words.http://www.sciencedirect.com/science/article/pii/S1319157815000191Arabic dialect stemmerGulf dialectRule base stemmingArabic NLP
spellingShingle Belal Abuata
Asma Al-Omari
A rule-based stemmer for Arabic Gulf dialect
Journal of King Saud University: Computer and Information Sciences
Arabic dialect stemmer
Gulf dialect
Rule base stemming
Arabic NLP
title A rule-based stemmer for Arabic Gulf dialect
title_full A rule-based stemmer for Arabic Gulf dialect
title_fullStr A rule-based stemmer for Arabic Gulf dialect
title_full_unstemmed A rule-based stemmer for Arabic Gulf dialect
title_short A rule-based stemmer for Arabic Gulf dialect
title_sort rule based stemmer for arabic gulf dialect
topic Arabic dialect stemmer
Gulf dialect
Rule base stemming
Arabic NLP
url http://www.sciencedirect.com/science/article/pii/S1319157815000191
work_keys_str_mv AT belalabuata arulebasedstemmerforarabicgulfdialect
AT asmaalomari arulebasedstemmerforarabicgulfdialect
AT belalabuata rulebasedstemmerforarabicgulfdialect
AT asmaalomari rulebasedstemmerforarabicgulfdialect