Syllable-Based Multi-POSMORPH Annotation for Korean Morphological Analysis and Part-of-Speech Tagging

Various research approaches have attempted to solve the length difference problem between the surface form and the base form of words in the Korean morphological analysis and part-of-speech (POS) tagging task. The compound POS tagging method is a popular approach, which tackles the problem using ann...

Full description

Bibliographic Details
Main Authors: Hyeong Jin Shin, Jeongyeon Park, Jae Sung Lee
Format: Article
Language:English
Published: MDPI AG 2023-02-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/5/2892
_version_ 1797615797767503872
author Hyeong Jin Shin
Jeongyeon Park
Jae Sung Lee
author_facet Hyeong Jin Shin
Jeongyeon Park
Jae Sung Lee
author_sort Hyeong Jin Shin
collection DOAJ
description Various research approaches have attempted to solve the length difference problem between the surface form and the base form of words in the Korean morphological analysis and part-of-speech (POS) tagging task. The compound POS tagging method is a popular approach, which tackles the problem using annotation tags. However, a dictionary is required for the post-processing to recover the base form and to dissolve the ambiguity of compound POS tags, which degrades the system performance. In this study, we propose a novel syllable-based multi-POSMORPH annotation method to solve the length difference problem within one framework, without using a dictionary for the post-processing. A multi-POSMORPH tag is created by combining POS tags and morpheme syllables for the simultaneous POS tagging and morpheme recovery. The model is implemented with a two-layer transformer encoder, which is lighter than the existing models based on large language models. Nonetheless, the experiments demonstrate that the performance of the proposed model is comparable to, or better than, that of previous models.
first_indexed 2024-03-11T07:31:59Z
format Article
id doaj.art-8f69e2a6bacb4d7ba1d5d630fc18f149
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T07:31:59Z
publishDate 2023-02-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-8f69e2a6bacb4d7ba1d5d630fc18f1492023-11-17T07:16:27ZengMDPI AGApplied Sciences2076-34172023-02-01135289210.3390/app13052892Syllable-Based Multi-POSMORPH Annotation for Korean Morphological Analysis and Part-of-Speech TaggingHyeong Jin Shin0Jeongyeon Park1Jae Sung Lee2Department of Computer Science, Chungbuk National University, Cheongju 28644, Republic of KoreaDepartment of Computer Science, Chungbuk National University, Cheongju 28644, Republic of KoreaDepartment of Computer Science, Chungbuk National University, Cheongju 28644, Republic of KoreaVarious research approaches have attempted to solve the length difference problem between the surface form and the base form of words in the Korean morphological analysis and part-of-speech (POS) tagging task. The compound POS tagging method is a popular approach, which tackles the problem using annotation tags. However, a dictionary is required for the post-processing to recover the base form and to dissolve the ambiguity of compound POS tags, which degrades the system performance. In this study, we propose a novel syllable-based multi-POSMORPH annotation method to solve the length difference problem within one framework, without using a dictionary for the post-processing. A multi-POSMORPH tag is created by combining POS tags and morpheme syllables for the simultaneous POS tagging and morpheme recovery. The model is implemented with a two-layer transformer encoder, which is lighter than the existing models based on large language models. Nonetheless, the experiments demonstrate that the performance of the proposed model is comparable to, or better than, that of previous models.https://www.mdpi.com/2076-3417/13/5/2892Korean morphological analysisKorean part-of-speech taggingtransformer encodersyllable-based annotation
spellingShingle Hyeong Jin Shin
Jeongyeon Park
Jae Sung Lee
Syllable-Based Multi-POSMORPH Annotation for Korean Morphological Analysis and Part-of-Speech Tagging
Applied Sciences
Korean morphological analysis
Korean part-of-speech tagging
transformer encoder
syllable-based annotation
title Syllable-Based Multi-POSMORPH Annotation for Korean Morphological Analysis and Part-of-Speech Tagging
title_full Syllable-Based Multi-POSMORPH Annotation for Korean Morphological Analysis and Part-of-Speech Tagging
title_fullStr Syllable-Based Multi-POSMORPH Annotation for Korean Morphological Analysis and Part-of-Speech Tagging
title_full_unstemmed Syllable-Based Multi-POSMORPH Annotation for Korean Morphological Analysis and Part-of-Speech Tagging
title_short Syllable-Based Multi-POSMORPH Annotation for Korean Morphological Analysis and Part-of-Speech Tagging
title_sort syllable based multi posmorph annotation for korean morphological analysis and part of speech tagging
topic Korean morphological analysis
Korean part-of-speech tagging
transformer encoder
syllable-based annotation
url https://www.mdpi.com/2076-3417/13/5/2892
work_keys_str_mv AT hyeongjinshin syllablebasedmultiposmorphannotationforkoreanmorphologicalanalysisandpartofspeechtagging
AT jeongyeonpark syllablebasedmultiposmorphannotationforkoreanmorphologicalanalysisandpartofspeechtagging
AT jaesunglee syllablebasedmultiposmorphannotationforkoreanmorphologicalanalysisandpartofspeechtagging