Stemming Hausa text: using affix-stripping rules and reference look-up
Stemming is a process of reducing a derivational or inflectional word to its root or stem by stripping all its affixes. It is been used in applications such as information retrieval, machine translation, and text summarization, as their pre-processing step to increase efficiency. Currently, there ar...
Main Authors: | , , , |
---|---|
Format: | Article |
Published: |
Springer Verlag (Germany)
2016
|
Subjects: |
_version_ | 1796960517630197760 |
---|---|
author | Bimba, A.T. Idris, N. Khamis, N. Mohd Noor, N.F. |
author_facet | Bimba, A.T. Idris, N. Khamis, N. Mohd Noor, N.F. |
author_sort | Bimba, A.T. |
collection | UM |
description | Stemming is a process of reducing a derivational or inflectional word to its root or stem by stripping all its affixes. It is been used in applications such as information retrieval, machine translation, and text summarization, as their pre-processing step to increase efficiency. Currently, there are a few stemming algorithms which have been developed for languages such as English, Arabic, Turkish, Malay and Amharic. Unfortunately, no algorithm has been used to stem text in Hausa, a Chadic language spoken in West Africa. To address this need, we propose stemming Hausa text using affix-stripping rules and reference lookup. We stemmed Hausa text, using 78 affix stripping rules applied in 4 steps and a reference look-up consisting of 1500 Hausa root words. The over-stemming index, under-stemming index, stemmer weight, word stemmed factor, correctly stemmed words factor and average words conflation factor were calculated to determine the effect of reference look-up on the strength and accuracy of the stemmer. It was observed that reference look-up aided in reducing both over-stemming and under-stemming errors, increased accuracy and has a tendency to reduce the strength of an affix stripping stemmer. The rationality behind the approach used is discussed and directions for future research are identified. |
first_indexed | 2024-03-06T05:45:49Z |
format | Article |
id | um.eprints-18607 |
institution | Universiti Malaya |
last_indexed | 2024-03-06T05:45:49Z |
publishDate | 2016 |
publisher | Springer Verlag (Germany) |
record_format | dspace |
spelling | um.eprints-186072018-04-25T07:05:50Z http://eprints.um.edu.my/18607/ Stemming Hausa text: using affix-stripping rules and reference look-up Bimba, A.T. Idris, N. Khamis, N. Mohd Noor, N.F. QA75 Electronic computers. Computer science Stemming is a process of reducing a derivational or inflectional word to its root or stem by stripping all its affixes. It is been used in applications such as information retrieval, machine translation, and text summarization, as their pre-processing step to increase efficiency. Currently, there are a few stemming algorithms which have been developed for languages such as English, Arabic, Turkish, Malay and Amharic. Unfortunately, no algorithm has been used to stem text in Hausa, a Chadic language spoken in West Africa. To address this need, we propose stemming Hausa text using affix-stripping rules and reference lookup. We stemmed Hausa text, using 78 affix stripping rules applied in 4 steps and a reference look-up consisting of 1500 Hausa root words. The over-stemming index, under-stemming index, stemmer weight, word stemmed factor, correctly stemmed words factor and average words conflation factor were calculated to determine the effect of reference look-up on the strength and accuracy of the stemmer. It was observed that reference look-up aided in reducing both over-stemming and under-stemming errors, increased accuracy and has a tendency to reduce the strength of an affix stripping stemmer. The rationality behind the approach used is discussed and directions for future research are identified. Springer Verlag (Germany) 2016-09 Article PeerReviewed Bimba, A.T. and Idris, N. and Khamis, N. and Mohd Noor, N.F. (2016) Stemming Hausa text: using affix-stripping rules and reference look-up. Language Resources and Evaluation, 50 (3). pp. 687-703. ISSN 1574-020X, DOI https://doi.org/10.1007/s10579-015-9311-x <https://doi.org/10.1007/s10579-015-9311-x>. https://doi.org/10.1007/s10579-015-9311-x doi:10.1007/s10579-015-9311-x |
spellingShingle | QA75 Electronic computers. Computer science Bimba, A.T. Idris, N. Khamis, N. Mohd Noor, N.F. Stemming Hausa text: using affix-stripping rules and reference look-up |
title | Stemming Hausa text: using affix-stripping rules and reference look-up |
title_full | Stemming Hausa text: using affix-stripping rules and reference look-up |
title_fullStr | Stemming Hausa text: using affix-stripping rules and reference look-up |
title_full_unstemmed | Stemming Hausa text: using affix-stripping rules and reference look-up |
title_short | Stemming Hausa text: using affix-stripping rules and reference look-up |
title_sort | stemming hausa text using affix stripping rules and reference look up |
topic | QA75 Electronic computers. Computer science |
work_keys_str_mv | AT bimbaat stemminghausatextusingaffixstrippingrulesandreferencelookup AT idrisn stemminghausatextusingaffixstrippingrulesandreferencelookup AT khamisn stemminghausatextusingaffixstrippingrulesandreferencelookup AT mohdnoornf stemminghausatextusingaffixstrippingrulesandreferencelookup |