Syllable-Based Multi-POSMORPH Annotation for Korean Morphological Analysis and Part-of-Speech Tagging
Various research approaches have attempted to solve the length difference problem between the surface form and the base form of words in the Korean morphological analysis and part-of-speech (POS) tagging task. The compound POS tagging method is a popular approach, which tackles the problem using ann...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-02-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/5/2892 |
_version_ | 1797615797767503872 |
---|---|
author | Hyeong Jin Shin Jeongyeon Park Jae Sung Lee |
author_facet | Hyeong Jin Shin Jeongyeon Park Jae Sung Lee |
author_sort | Hyeong Jin Shin |
collection | DOAJ |
description | Various research approaches have attempted to solve the length difference problem between the surface form and the base form of words in the Korean morphological analysis and part-of-speech (POS) tagging task. The compound POS tagging method is a popular approach, which tackles the problem using annotation tags. However, a dictionary is required for the post-processing to recover the base form and to dissolve the ambiguity of compound POS tags, which degrades the system performance. In this study, we propose a novel syllable-based multi-POSMORPH annotation method to solve the length difference problem within one framework, without using a dictionary for the post-processing. A multi-POSMORPH tag is created by combining POS tags and morpheme syllables for the simultaneous POS tagging and morpheme recovery. The model is implemented with a two-layer transformer encoder, which is lighter than the existing models based on large language models. Nonetheless, the experiments demonstrate that the performance of the proposed model is comparable to, or better than, that of previous models. |
first_indexed | 2024-03-11T07:31:59Z |
format | Article |
id | doaj.art-8f69e2a6bacb4d7ba1d5d630fc18f149 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-11T07:31:59Z |
publishDate | 2023-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-8f69e2a6bacb4d7ba1d5d630fc18f1492023-11-17T07:16:27ZengMDPI AGApplied Sciences2076-34172023-02-01135289210.3390/app13052892Syllable-Based Multi-POSMORPH Annotation for Korean Morphological Analysis and Part-of-Speech TaggingHyeong Jin Shin0Jeongyeon Park1Jae Sung Lee2Department of Computer Science, Chungbuk National University, Cheongju 28644, Republic of KoreaDepartment of Computer Science, Chungbuk National University, Cheongju 28644, Republic of KoreaDepartment of Computer Science, Chungbuk National University, Cheongju 28644, Republic of KoreaVarious research approaches have attempted to solve the length difference problem between the surface form and the base form of words in the Korean morphological analysis and part-of-speech (POS) tagging task. The compound POS tagging method is a popular approach, which tackles the problem using annotation tags. However, a dictionary is required for the post-processing to recover the base form and to dissolve the ambiguity of compound POS tags, which degrades the system performance. In this study, we propose a novel syllable-based multi-POSMORPH annotation method to solve the length difference problem within one framework, without using a dictionary for the post-processing. A multi-POSMORPH tag is created by combining POS tags and morpheme syllables for the simultaneous POS tagging and morpheme recovery. The model is implemented with a two-layer transformer encoder, which is lighter than the existing models based on large language models. Nonetheless, the experiments demonstrate that the performance of the proposed model is comparable to, or better than, that of previous models.https://www.mdpi.com/2076-3417/13/5/2892Korean morphological analysisKorean part-of-speech taggingtransformer encodersyllable-based annotation |
spellingShingle | Hyeong Jin Shin Jeongyeon Park Jae Sung Lee Syllable-Based Multi-POSMORPH Annotation for Korean Morphological Analysis and Part-of-Speech Tagging Applied Sciences Korean morphological analysis Korean part-of-speech tagging transformer encoder syllable-based annotation |
title | Syllable-Based Multi-POSMORPH Annotation for Korean Morphological Analysis and Part-of-Speech Tagging |
title_full | Syllable-Based Multi-POSMORPH Annotation for Korean Morphological Analysis and Part-of-Speech Tagging |
title_fullStr | Syllable-Based Multi-POSMORPH Annotation for Korean Morphological Analysis and Part-of-Speech Tagging |
title_full_unstemmed | Syllable-Based Multi-POSMORPH Annotation for Korean Morphological Analysis and Part-of-Speech Tagging |
title_short | Syllable-Based Multi-POSMORPH Annotation for Korean Morphological Analysis and Part-of-Speech Tagging |
title_sort | syllable based multi posmorph annotation for korean morphological analysis and part of speech tagging |
topic | Korean morphological analysis Korean part-of-speech tagging transformer encoder syllable-based annotation |
url | https://www.mdpi.com/2076-3417/13/5/2892 |
work_keys_str_mv | AT hyeongjinshin syllablebasedmultiposmorphannotationforkoreanmorphologicalanalysisandpartofspeechtagging AT jeongyeonpark syllablebasedmultiposmorphannotationforkoreanmorphologicalanalysisandpartofspeechtagging AT jaesunglee syllablebasedmultiposmorphannotationforkoreanmorphologicalanalysisandpartofspeechtagging |