Stemming Hausa text: using affix-stripping rules and reference look-up

Stemming is a process of reducing a derivational or inflectional word to its root or stem by stripping all its affixes. It is been used in applications such as information retrieval, machine translation, and text summarization, as their pre-processing step to increase efficiency. Currently, there ar...

Full description

Bibliographic Details
Main Authors: Bimba, A.T., Idris, N., Khamis, N., Mohd Noor, N.F.
Format: Article
Published: Springer Verlag (Germany) 2016
Subjects:
_version_ 1796960517630197760
author Bimba, A.T.
Idris, N.
Khamis, N.
Mohd Noor, N.F.
author_facet Bimba, A.T.
Idris, N.
Khamis, N.
Mohd Noor, N.F.
author_sort Bimba, A.T.
collection UM
description Stemming is a process of reducing a derivational or inflectional word to its root or stem by stripping all its affixes. It is been used in applications such as information retrieval, machine translation, and text summarization, as their pre-processing step to increase efficiency. Currently, there are a few stemming algorithms which have been developed for languages such as English, Arabic, Turkish, Malay and Amharic. Unfortunately, no algorithm has been used to stem text in Hausa, a Chadic language spoken in West Africa. To address this need, we propose stemming Hausa text using affix-stripping rules and reference lookup. We stemmed Hausa text, using 78 affix stripping rules applied in 4 steps and a reference look-up consisting of 1500 Hausa root words. The over-stemming index, under-stemming index, stemmer weight, word stemmed factor, correctly stemmed words factor and average words conflation factor were calculated to determine the effect of reference look-up on the strength and accuracy of the stemmer. It was observed that reference look-up aided in reducing both over-stemming and under-stemming errors, increased accuracy and has a tendency to reduce the strength of an affix stripping stemmer. The rationality behind the approach used is discussed and directions for future research are identified.
first_indexed 2024-03-06T05:45:49Z
format Article
id um.eprints-18607
institution Universiti Malaya
last_indexed 2024-03-06T05:45:49Z
publishDate 2016
publisher Springer Verlag (Germany)
record_format dspace
spelling um.eprints-186072018-04-25T07:05:50Z http://eprints.um.edu.my/18607/ Stemming Hausa text: using affix-stripping rules and reference look-up Bimba, A.T. Idris, N. Khamis, N. Mohd Noor, N.F. QA75 Electronic computers. Computer science Stemming is a process of reducing a derivational or inflectional word to its root or stem by stripping all its affixes. It is been used in applications such as information retrieval, machine translation, and text summarization, as their pre-processing step to increase efficiency. Currently, there are a few stemming algorithms which have been developed for languages such as English, Arabic, Turkish, Malay and Amharic. Unfortunately, no algorithm has been used to stem text in Hausa, a Chadic language spoken in West Africa. To address this need, we propose stemming Hausa text using affix-stripping rules and reference lookup. We stemmed Hausa text, using 78 affix stripping rules applied in 4 steps and a reference look-up consisting of 1500 Hausa root words. The over-stemming index, under-stemming index, stemmer weight, word stemmed factor, correctly stemmed words factor and average words conflation factor were calculated to determine the effect of reference look-up on the strength and accuracy of the stemmer. It was observed that reference look-up aided in reducing both over-stemming and under-stemming errors, increased accuracy and has a tendency to reduce the strength of an affix stripping stemmer. The rationality behind the approach used is discussed and directions for future research are identified. Springer Verlag (Germany) 2016-09 Article PeerReviewed Bimba, A.T. and Idris, N. and Khamis, N. and Mohd Noor, N.F. (2016) Stemming Hausa text: using affix-stripping rules and reference look-up. Language Resources and Evaluation, 50 (3). pp. 687-703. ISSN 1574-020X, DOI https://doi.org/10.1007/s10579-015-9311-x <https://doi.org/10.1007/s10579-015-9311-x>. https://doi.org/10.1007/s10579-015-9311-x doi:10.1007/s10579-015-9311-x
spellingShingle QA75 Electronic computers. Computer science
Bimba, A.T.
Idris, N.
Khamis, N.
Mohd Noor, N.F.
Stemming Hausa text: using affix-stripping rules and reference look-up
title Stemming Hausa text: using affix-stripping rules and reference look-up
title_full Stemming Hausa text: using affix-stripping rules and reference look-up
title_fullStr Stemming Hausa text: using affix-stripping rules and reference look-up
title_full_unstemmed Stemming Hausa text: using affix-stripping rules and reference look-up
title_short Stemming Hausa text: using affix-stripping rules and reference look-up
title_sort stemming hausa text using affix stripping rules and reference look up
topic QA75 Electronic computers. Computer science
work_keys_str_mv AT bimbaat stemminghausatextusingaffixstrippingrulesandreferencelookup
AT idrisn stemminghausatextusingaffixstrippingrulesandreferencelookup
AT khamisn stemminghausatextusingaffixstrippingrulesandreferencelookup
AT mohdnoornf stemminghausatextusingaffixstrippingrulesandreferencelookup