Towards stemming error reduction for malay texts

Text stemmer is one of useful language preprocessing tools in the field of information retrieval, text mining and natural language processing. It is used to map morphological variants of words into base forms. Most of the current text stemmers for the Malay language focused on removing affixes, clit...

Full description

Bibliographic Details
Main Authors: Kassim, M. N., Jalil, S. H. M., Maarof, Z. A., Zainal, A.
Format: Conference or Workshop Item
Published: 2019
Subjects:
_version_ 1796864714158899200
author Kassim, M. N.
Jalil, S. H. M.
Maarof, Z. A.
Zainal, A.
author_facet Kassim, M. N.
Jalil, S. H. M.
Maarof, Z. A.
Zainal, A.
author_sort Kassim, M. N.
collection ePrints
description Text stemmer is one of useful language preprocessing tools in the field of information retrieval, text mining and natural language processing. It is used to map morphological variants of words into base forms. Most of the current text stemmers for the Malay language focused on removing affixes, clitics, and particles from affixation words. However, these stemmers still suffered from stemming errors due to insufficiently address the root cause of these stemming errors. This paper investigates the root cause of stemming errors and proposes stemming technique to address possible stemming errors. The proposed text stemmer uses affixes removal method and multiple dictionary lookup to address various root causes of stemming errors. The experimental results showed promising stemming accuracy in reducing various possible stemming errors.
first_indexed 2024-03-05T20:45:59Z
format Conference or Workshop Item
id utm.eprints-88862
institution Universiti Teknologi Malaysia - ePrints
last_indexed 2024-03-05T20:45:59Z
publishDate 2019
record_format dspace
spelling utm.eprints-888622020-12-29T04:38:41Z http://eprints.utm.my/88862/ Towards stemming error reduction for malay texts Kassim, M. N. Jalil, S. H. M. Maarof, Z. A. Zainal, A. QA75 Electronic computers. Computer science Text stemmer is one of useful language preprocessing tools in the field of information retrieval, text mining and natural language processing. It is used to map morphological variants of words into base forms. Most of the current text stemmers for the Malay language focused on removing affixes, clitics, and particles from affixation words. However, these stemmers still suffered from stemming errors due to insufficiently address the root cause of these stemming errors. This paper investigates the root cause of stemming errors and proposes stemming technique to address possible stemming errors. The proposed text stemmer uses affixes removal method and multiple dictionary lookup to address various root causes of stemming errors. The experimental results showed promising stemming accuracy in reducing various possible stemming errors. 2019 Conference or Workshop Item PeerReviewed Kassim, M. N. and Jalil, S. H. M. and Maarof, Z. A. and Zainal, A. (2019) Towards stemming error reduction for malay texts. In: 5th International Conference on Computational Science and Technology, ICCST 2018, 29-30 Aug 2018, Kota Kinabalu, Malaysia. http://www.dx.doi.org/10.1007/978-981-13-2622-6_2
spellingShingle QA75 Electronic computers. Computer science
Kassim, M. N.
Jalil, S. H. M.
Maarof, Z. A.
Zainal, A.
Towards stemming error reduction for malay texts
title Towards stemming error reduction for malay texts
title_full Towards stemming error reduction for malay texts
title_fullStr Towards stemming error reduction for malay texts
title_full_unstemmed Towards stemming error reduction for malay texts
title_short Towards stemming error reduction for malay texts
title_sort towards stemming error reduction for malay texts
topic QA75 Electronic computers. Computer science
work_keys_str_mv AT kassimmn towardsstemmingerrorreductionformalaytexts
AT jalilshm towardsstemmingerrorreductionformalaytexts
AT maarofza towardsstemmingerrorreductionformalaytexts
AT zainala towardsstemmingerrorreductionformalaytexts