MALAY PART OF SPEECH TAGGER: A COMPARATIVE STUDY ON TAGGING TOOLS

Malay language is an agglutinative language which rich morphology. Affixation to a root word is the most common morphological processes used to derive a new word for other meaning that would affect the change in their part of speech (POS). Malay annotated corpus is not freely available...

Full description

Bibliographic Details
Main Authors: Hassan Mohamed, Nazlia Omar, Mohd. Juzaiddin Ab. Aziz
Format: Article
Language:English
Published: UKM Press 2015-06-01
Series:Asia-Pacific Journal of Information Technology and Multimedia
Subjects:
Online Access:https://www.ukm.my/apjitm/view.php?id=82
_version_ 1818739042748989440
author Hassan Mohamed
Nazlia Omar
Mohd. Juzaiddin Ab. Aziz
author_facet Hassan Mohamed
Nazlia Omar
Mohd. Juzaiddin Ab. Aziz
author_sort Hassan Mohamed
collection DOAJ
description Malay language is an agglutinative language which rich morphology. Affixation to a root word is the most common morphological processes used to derive a new word for other meaning that would affect the change in their part of speech (POS). Malay annotated corpus is not freely available, so there is no publication report on the comparison of the performance of POS tagging using Hidden Markov Model (HMM), Maximum Entropy (ME) and Support Vector Machine (SVM), especially to look into the effect of Malay morphology for tagging unknown words. This paper aims to present the evaluation of TnT using HMM, MaxEnt using ME and SVMTool using SVM. In order to train and test such methods in tagging Malay language, efforts has been taken to annotate the Malay corpus in health domain. Modifications has been done to TnT to fit in prefix and circumfix features. The results of the experiments shows that SVMTool outperforms TnT and MaxEnt for overall accuracy (99.23% for SVMTool, 94% for TnT and 96% for Maxent) and tagging unknown words accuracy (96.78% for SVMTool, 67% for TnT and 86.23% for MaxEnt ). MaxEnt outperforms TnT for the overall accuracy and tagging unknown words. As the tagging accuracy of SVMTool to unknown word succeeds 96.78%, it would be the best tool for tagging Malay language for a specific domain.
first_indexed 2024-12-18T01:18:33Z
format Article
id doaj.art-9cb80ac979c04165bd3d85a3b90660ae
institution Directory Open Access Journal
issn 2289-2192
language English
last_indexed 2024-12-18T01:18:33Z
publishDate 2015-06-01
publisher UKM Press
record_format Article
series Asia-Pacific Journal of Information Technology and Multimedia
spelling doaj.art-9cb80ac979c04165bd3d85a3b90660ae2022-12-21T21:25:54ZengUKM PressAsia-Pacific Journal of Information Technology and Multimedia2289-21922015-06-014(1)1123https://doi.org/10.17576/apjitm-2015-0401-02MALAY PART OF SPEECH TAGGER: A COMPARATIVE STUDY ON TAGGING TOOLSHassan MohamedNazlia OmarMohd. Juzaiddin Ab. AzizMalay language is an agglutinative language which rich morphology. Affixation to a root word is the most common morphological processes used to derive a new word for other meaning that would affect the change in their part of speech (POS). Malay annotated corpus is not freely available, so there is no publication report on the comparison of the performance of POS tagging using Hidden Markov Model (HMM), Maximum Entropy (ME) and Support Vector Machine (SVM), especially to look into the effect of Malay morphology for tagging unknown words. This paper aims to present the evaluation of TnT using HMM, MaxEnt using ME and SVMTool using SVM. In order to train and test such methods in tagging Malay language, efforts has been taken to annotate the Malay corpus in health domain. Modifications has been done to TnT to fit in prefix and circumfix features. The results of the experiments shows that SVMTool outperforms TnT and MaxEnt for overall accuracy (99.23% for SVMTool, 94% for TnT and 96% for Maxent) and tagging unknown words accuracy (96.78% for SVMTool, 67% for TnT and 86.23% for MaxEnt ). MaxEnt outperforms TnT for the overall accuracy and tagging unknown words. As the tagging accuracy of SVMTool to unknown word succeeds 96.78%, it would be the best tool for tagging Malay language for a specific domain.https://www.ukm.my/apjitm/view.php?id=82malay pos taggermalay morphemesunknown word
spellingShingle Hassan Mohamed
Nazlia Omar
Mohd. Juzaiddin Ab. Aziz
MALAY PART OF SPEECH TAGGER: A COMPARATIVE STUDY ON TAGGING TOOLS
Asia-Pacific Journal of Information Technology and Multimedia
malay pos tagger
malay morphemes
unknown word
title MALAY PART OF SPEECH TAGGER: A COMPARATIVE STUDY ON TAGGING TOOLS
title_full MALAY PART OF SPEECH TAGGER: A COMPARATIVE STUDY ON TAGGING TOOLS
title_fullStr MALAY PART OF SPEECH TAGGER: A COMPARATIVE STUDY ON TAGGING TOOLS
title_full_unstemmed MALAY PART OF SPEECH TAGGER: A COMPARATIVE STUDY ON TAGGING TOOLS
title_short MALAY PART OF SPEECH TAGGER: A COMPARATIVE STUDY ON TAGGING TOOLS
title_sort malay part of speech tagger a comparative study on tagging tools
topic malay pos tagger
malay morphemes
unknown word
url https://www.ukm.my/apjitm/view.php?id=82
work_keys_str_mv AT hassanmohamed malaypartofspeechtaggeracomparativestudyontaggingtools
AT nazliaomar malaypartofspeechtaggeracomparativestudyontaggingtools
AT mohdjuzaiddinabaziz malaypartofspeechtaggeracomparativestudyontaggingtools