Towards stemming error reduction for malay texts
Text stemmer is one of useful language preprocessing tools in the field of information retrieval, text mining and natural language processing. It is used to map morphological variants of words into base forms. Most of the current text stemmers for the Malay language focused on removing affixes, clit...
Main Authors: | , , , |
---|---|
Format: | Conference or Workshop Item |
Published: |
2019
|
Subjects: |
_version_ | 1796864714158899200 |
---|---|
author | Kassim, M. N. Jalil, S. H. M. Maarof, Z. A. Zainal, A. |
author_facet | Kassim, M. N. Jalil, S. H. M. Maarof, Z. A. Zainal, A. |
author_sort | Kassim, M. N. |
collection | ePrints |
description | Text stemmer is one of useful language preprocessing tools in the field of information retrieval, text mining and natural language processing. It is used to map morphological variants of words into base forms. Most of the current text stemmers for the Malay language focused on removing affixes, clitics, and particles from affixation words. However, these stemmers still suffered from stemming errors due to insufficiently address the root cause of these stemming errors. This paper investigates the root cause of stemming errors and proposes stemming technique to address possible stemming errors. The proposed text stemmer uses affixes removal method and multiple dictionary lookup to address various root causes of stemming errors. The experimental results showed promising stemming accuracy in reducing various possible stemming errors. |
first_indexed | 2024-03-05T20:45:59Z |
format | Conference or Workshop Item |
id | utm.eprints-88862 |
institution | Universiti Teknologi Malaysia - ePrints |
last_indexed | 2024-03-05T20:45:59Z |
publishDate | 2019 |
record_format | dspace |
spelling | utm.eprints-888622020-12-29T04:38:41Z http://eprints.utm.my/88862/ Towards stemming error reduction for malay texts Kassim, M. N. Jalil, S. H. M. Maarof, Z. A. Zainal, A. QA75 Electronic computers. Computer science Text stemmer is one of useful language preprocessing tools in the field of information retrieval, text mining and natural language processing. It is used to map morphological variants of words into base forms. Most of the current text stemmers for the Malay language focused on removing affixes, clitics, and particles from affixation words. However, these stemmers still suffered from stemming errors due to insufficiently address the root cause of these stemming errors. This paper investigates the root cause of stemming errors and proposes stemming technique to address possible stemming errors. The proposed text stemmer uses affixes removal method and multiple dictionary lookup to address various root causes of stemming errors. The experimental results showed promising stemming accuracy in reducing various possible stemming errors. 2019 Conference or Workshop Item PeerReviewed Kassim, M. N. and Jalil, S. H. M. and Maarof, Z. A. and Zainal, A. (2019) Towards stemming error reduction for malay texts. In: 5th International Conference on Computational Science and Technology, ICCST 2018, 29-30 Aug 2018, Kota Kinabalu, Malaysia. http://www.dx.doi.org/10.1007/978-981-13-2622-6_2 |
spellingShingle | QA75 Electronic computers. Computer science Kassim, M. N. Jalil, S. H. M. Maarof, Z. A. Zainal, A. Towards stemming error reduction for malay texts |
title | Towards stemming error reduction for malay texts |
title_full | Towards stemming error reduction for malay texts |
title_fullStr | Towards stemming error reduction for malay texts |
title_full_unstemmed | Towards stemming error reduction for malay texts |
title_short | Towards stemming error reduction for malay texts |
title_sort | towards stemming error reduction for malay texts |
topic | QA75 Electronic computers. Computer science |
work_keys_str_mv | AT kassimmn towardsstemmingerrorreductionformalaytexts AT jalilshm towardsstemmingerrorreductionformalaytexts AT maarofza towardsstemmingerrorreductionformalaytexts AT zainala towardsstemmingerrorreductionformalaytexts |