MALAY PART OF SPEECH TAGGER: A COMPARATIVE STUDY ON TAGGING TOOLS
Malay language is an agglutinative language which rich morphology. Affixation to a root word is the most common morphological processes used to derive a new word for other meaning that would affect the change in their part of speech (POS). Malay annotated corpus is not freely available...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
UKM Press
2015-06-01
|
Series: | Asia-Pacific Journal of Information Technology and Multimedia |
Subjects: | |
Online Access: | https://www.ukm.my/apjitm/view.php?id=82 |
_version_ | 1818739042748989440 |
---|---|
author | Hassan Mohamed Nazlia Omar Mohd. Juzaiddin Ab. Aziz |
author_facet | Hassan Mohamed Nazlia Omar Mohd. Juzaiddin Ab. Aziz |
author_sort | Hassan Mohamed |
collection | DOAJ |
description | Malay language is an agglutinative language which rich morphology. Affixation to a root word is the most common morphological processes used to derive a new word for other meaning that would affect the change in their part of speech (POS). Malay annotated corpus is not freely available, so there is no publication report on the comparison of the performance of POS tagging using Hidden Markov Model (HMM), Maximum Entropy (ME) and Support Vector Machine (SVM), especially to look into the effect of Malay morphology for tagging unknown words. This paper aims to present the evaluation of TnT using HMM, MaxEnt using ME and SVMTool using SVM. In order to train and test such methods in tagging Malay language, efforts has been taken to annotate the Malay corpus in health domain. Modifications has been done to TnT to fit in prefix and circumfix features. The results of the experiments shows that SVMTool outperforms TnT and MaxEnt for overall accuracy (99.23% for SVMTool, 94% for TnT and 96% for Maxent) and tagging unknown words accuracy (96.78% for SVMTool, 67% for TnT and 86.23% for MaxEnt ). MaxEnt outperforms TnT for the overall accuracy and tagging unknown words. As the tagging accuracy of SVMTool to unknown word succeeds 96.78%, it would be the best tool for tagging Malay language for a specific domain. |
first_indexed | 2024-12-18T01:18:33Z |
format | Article |
id | doaj.art-9cb80ac979c04165bd3d85a3b90660ae |
institution | Directory Open Access Journal |
issn | 2289-2192 |
language | English |
last_indexed | 2024-12-18T01:18:33Z |
publishDate | 2015-06-01 |
publisher | UKM Press |
record_format | Article |
series | Asia-Pacific Journal of Information Technology and Multimedia |
spelling | doaj.art-9cb80ac979c04165bd3d85a3b90660ae2022-12-21T21:25:54ZengUKM PressAsia-Pacific Journal of Information Technology and Multimedia2289-21922015-06-014(1)1123https://doi.org/10.17576/apjitm-2015-0401-02MALAY PART OF SPEECH TAGGER: A COMPARATIVE STUDY ON TAGGING TOOLSHassan MohamedNazlia OmarMohd. Juzaiddin Ab. AzizMalay language is an agglutinative language which rich morphology. Affixation to a root word is the most common morphological processes used to derive a new word for other meaning that would affect the change in their part of speech (POS). Malay annotated corpus is not freely available, so there is no publication report on the comparison of the performance of POS tagging using Hidden Markov Model (HMM), Maximum Entropy (ME) and Support Vector Machine (SVM), especially to look into the effect of Malay morphology for tagging unknown words. This paper aims to present the evaluation of TnT using HMM, MaxEnt using ME and SVMTool using SVM. In order to train and test such methods in tagging Malay language, efforts has been taken to annotate the Malay corpus in health domain. Modifications has been done to TnT to fit in prefix and circumfix features. The results of the experiments shows that SVMTool outperforms TnT and MaxEnt for overall accuracy (99.23% for SVMTool, 94% for TnT and 96% for Maxent) and tagging unknown words accuracy (96.78% for SVMTool, 67% for TnT and 86.23% for MaxEnt ). MaxEnt outperforms TnT for the overall accuracy and tagging unknown words. As the tagging accuracy of SVMTool to unknown word succeeds 96.78%, it would be the best tool for tagging Malay language for a specific domain.https://www.ukm.my/apjitm/view.php?id=82malay pos taggermalay morphemesunknown word |
spellingShingle | Hassan Mohamed Nazlia Omar Mohd. Juzaiddin Ab. Aziz MALAY PART OF SPEECH TAGGER: A COMPARATIVE STUDY ON TAGGING TOOLS Asia-Pacific Journal of Information Technology and Multimedia malay pos tagger malay morphemes unknown word |
title | MALAY PART OF SPEECH TAGGER: A COMPARATIVE STUDY ON TAGGING TOOLS |
title_full | MALAY PART OF SPEECH TAGGER: A COMPARATIVE STUDY ON TAGGING TOOLS |
title_fullStr | MALAY PART OF SPEECH TAGGER: A COMPARATIVE STUDY ON TAGGING TOOLS |
title_full_unstemmed | MALAY PART OF SPEECH TAGGER: A COMPARATIVE STUDY ON TAGGING TOOLS |
title_short | MALAY PART OF SPEECH TAGGER: A COMPARATIVE STUDY ON TAGGING TOOLS |
title_sort | malay part of speech tagger a comparative study on tagging tools |
topic | malay pos tagger malay morphemes unknown word |
url | https://www.ukm.my/apjitm/view.php?id=82 |
work_keys_str_mv | AT hassanmohamed malaypartofspeechtaggeracomparativestudyontaggingtools AT nazliaomar malaypartofspeechtaggeracomparativestudyontaggingtools AT mohdjuzaiddinabaziz malaypartofspeechtaggeracomparativestudyontaggingtools |