Experimental Study of Morphological Analyzers for Topic Categorization in News Articles

Natural language processing refers to the ability of computers to understand text and spoken words similar to humans. Recently, various machine learning techniques have been used to encode a large amount of text and decode feature vectors of text successfully. However, understanding low-resource lan...

Full description

Bibliographic Details
Main Author:	Sangtae Ahn
Format:	Article
Language:	English
Published:	MDPI AG 2023-09-01
Series:	Applied Sciences
Subjects:	natural language processing morphological analyzer topic categorization news article
Online Access:	https://www.mdpi.com/2076-3417/13/19/10572

_version_	1797576226552938496
author	Sangtae Ahn
author_facet	Sangtae Ahn
author_sort	Sangtae Ahn
collection	DOAJ
description	Natural language processing refers to the ability of computers to understand text and spoken words similar to humans. Recently, various machine learning techniques have been used to encode a large amount of text and decode feature vectors of text successfully. However, understanding low-resource languages is in the early stages of research. In particular, Korean, which is an agglutinative language, needs sophisticated preprocessing steps, such as morphological analysis. Since morphological analysis in preprocessing significantly influences classification results, ideal and optimized morphological analyzers must be used. This study explored five state-of-the-art morphological analyzers for Korean news articles and categorized their topics into seven classes using term frequency–inverse document frequency and light gradient boosting machine frameworks. It was found that a morphological analyzer based on unsupervised learning achieved a computation time of 6 s in 500,899 tokens, which is 72 times faster than the slowest analyzer (432 s). In addition, a morphological analyzer using dynamic programming achieved a topic categorization accuracy of 82.5%, which is 9.4% higher than achieve when using the hidden Markov model (73.1%) and 13.4% higher compared to the baseline (69.1%) without any morphological analyzer in news articles. This study can provide insight into how each morphological analyzer extracts morphemes in sentences and affects categorizing topics in news articles.
first_indexed	2024-03-10T21:49:17Z
format	Article
id	doaj.art-1f100bf0f4304757b8abd4ef6ee35332
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-10T21:49:17Z
publishDate	2023-09-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-1f100bf0f4304757b8abd4ef6ee353322023-11-19T14:01:15ZengMDPI AGApplied Sciences2076-34172023-09-0113191057210.3390/app131910572Experimental Study of Morphological Analyzers for Topic Categorization in News ArticlesSangtae Ahn0School of Electronics Engineering, Kyungpook National University, Daegu 41566, Republic of KoreaNatural language processing refers to the ability of computers to understand text and spoken words similar to humans. Recently, various machine learning techniques have been used to encode a large amount of text and decode feature vectors of text successfully. However, understanding low-resource languages is in the early stages of research. In particular, Korean, which is an agglutinative language, needs sophisticated preprocessing steps, such as morphological analysis. Since morphological analysis in preprocessing significantly influences classification results, ideal and optimized morphological analyzers must be used. This study explored five state-of-the-art morphological analyzers for Korean news articles and categorized their topics into seven classes using term frequency–inverse document frequency and light gradient boosting machine frameworks. It was found that a morphological analyzer based on unsupervised learning achieved a computation time of 6 s in 500,899 tokens, which is 72 times faster than the slowest analyzer (432 s). In addition, a morphological analyzer using dynamic programming achieved a topic categorization accuracy of 82.5%, which is 9.4% higher than achieve when using the hidden Markov model (73.1%) and 13.4% higher compared to the baseline (69.1%) without any morphological analyzer in news articles. This study can provide insight into how each morphological analyzer extracts morphemes in sentences and affects categorizing topics in news articles.https://www.mdpi.com/2076-3417/13/19/10572natural language processingmorphological analyzertopic categorizationnews article
spellingShingle	Sangtae Ahn Experimental Study of Morphological Analyzers for Topic Categorization in News Articles Applied Sciences natural language processing morphological analyzer topic categorization news article
title	Experimental Study of Morphological Analyzers for Topic Categorization in News Articles
title_full	Experimental Study of Morphological Analyzers for Topic Categorization in News Articles
title_fullStr	Experimental Study of Morphological Analyzers for Topic Categorization in News Articles
title_full_unstemmed	Experimental Study of Morphological Analyzers for Topic Categorization in News Articles
title_short	Experimental Study of Morphological Analyzers for Topic Categorization in News Articles
title_sort	experimental study of morphological analyzers for topic categorization in news articles
topic	natural language processing morphological analyzer topic categorization news article
url	https://www.mdpi.com/2076-3417/13/19/10572
work_keys_str_mv	AT sangtaeahn experimentalstudyofmorphologicalanalyzersfortopiccategorizationinnewsarticles

Experimental Study of Morphological Analyzers for Topic Categorization in News Articles

Similar Items