Towards Kyrgyz stop words

The concept of stop words introduced by H. P. Lun in the mid-20th century plays a huge role in today’s NLP practice. Stop words are used to reduce noisy text data, remove uninformative words, speed up text processing, and minimize the amount of memory required to store data.The Kyrgyz language is a...

Full description

Bibliographic Details
Main Authors: Ruslan Isaev, Gulzada Esenalieva, Ermek Doszhanov
Format: Article
Language:deu
Published: Vilnius University 2023-12-01
Series:Kalbotyra
Subjects:
Online Access:https://www.zurnalai.vu.lt/kalbotyra/article/view/34292
Description
Summary:The concept of stop words introduced by H. P. Lun in the mid-20th century plays a huge role in today’s NLP practice. Stop words are used to reduce noisy text data, remove uninformative words, speed up text processing, and minimize the amount of memory required to store data.The Kyrgyz language is an agglutinative Turkic language for which no scientific study of stop words has been previously published in English. In our study, we combined frequency analysis with rule-based linguistic analysis. First, we found the most frequently used words, set a threshold, and removed words below the threshold. This way we got a list of the most frequently used words. Then we reduced the list by excluding from the list all words that do not belong to the category of function words of the Kyrgyz language. Finally, we got a list of 50 words that can be considered stop words in the Kyrgyz language. In our analysis, we used a single corpus of sentences collected and posted as an open source project by one of the local broadcasters.
ISSN:1392-1517
2029-8315