Towards Kyrgyz stop words

The concept of stop words introduced by H. P. Lun in the mid-20th century plays a huge role in today’s NLP practice. Stop words are used to reduce noisy text data, remove uninformative words, speed up text processing, and minimize the amount of memory required to store data.The Kyrgyz language is a...

Full description

Bibliographic Details
Main Authors: Ruslan Isaev, Gulzada Esenalieva, Ermek Doszhanov
Format: Article
Language:deu
Published: Vilnius University 2023-12-01
Series:Kalbotyra
Subjects:
Online Access:https://www.zurnalai.vu.lt/kalbotyra/article/view/34292
_version_ 1797198410583900160
author Ruslan Isaev
Gulzada Esenalieva
Ermek Doszhanov
author_facet Ruslan Isaev
Gulzada Esenalieva
Ermek Doszhanov
author_sort Ruslan Isaev
collection DOAJ
description The concept of stop words introduced by H. P. Lun in the mid-20th century plays a huge role in today’s NLP practice. Stop words are used to reduce noisy text data, remove uninformative words, speed up text processing, and minimize the amount of memory required to store data.The Kyrgyz language is an agglutinative Turkic language for which no scientific study of stop words has been previously published in English. In our study, we combined frequency analysis with rule-based linguistic analysis. First, we found the most frequently used words, set a threshold, and removed words below the threshold. This way we got a list of the most frequently used words. Then we reduced the list by excluding from the list all words that do not belong to the category of function words of the Kyrgyz language. Finally, we got a list of 50 words that can be considered stop words in the Kyrgyz language. In our analysis, we used a single corpus of sentences collected and posted as an open source project by one of the local broadcasters.
first_indexed 2024-03-07T14:22:56Z
format Article
id doaj.art-0e461890b17e4ca5af21c57315f5092a
institution Directory Open Access Journal
issn 1392-1517
2029-8315
language deu
last_indexed 2024-04-24T06:59:25Z
publishDate 2023-12-01
publisher Vilnius University
record_format Article
series Kalbotyra
spelling doaj.art-0e461890b17e4ca5af21c57315f5092a2024-04-22T08:59:18ZdeuVilnius UniversityKalbotyra1392-15172029-83152023-12-017610.15388/Kalbotyra.2023.76.4Towards Kyrgyz stop wordsRuslan Isaev0https://orcid.org/0000-0003-4426-8837Gulzada Esenalieva1https://orcid.org/0009-0000-9135-1671Ermek Doszhanov2https://orcid.org/0009-0002-4939-5683Ala-Too International University, Kyrgyz RepublicAla-Too International University, Kyrgyz RepublicAla-Too International University, Kyrgyz Republic The concept of stop words introduced by H. P. Lun in the mid-20th century plays a huge role in today’s NLP practice. Stop words are used to reduce noisy text data, remove uninformative words, speed up text processing, and minimize the amount of memory required to store data.The Kyrgyz language is an agglutinative Turkic language for which no scientific study of stop words has been previously published in English. In our study, we combined frequency analysis with rule-based linguistic analysis. First, we found the most frequently used words, set a threshold, and removed words below the threshold. This way we got a list of the most frequently used words. Then we reduced the list by excluding from the list all words that do not belong to the category of function words of the Kyrgyz language. Finally, we got a list of 50 words that can be considered stop words in the Kyrgyz language. In our analysis, we used a single corpus of sentences collected and posted as an open source project by one of the local broadcasters. https://www.zurnalai.vu.lt/kalbotyra/article/view/34292stop wordsKyrgyz languagefrequency analysisTurkic stop wordsNLP
spellingShingle Ruslan Isaev
Gulzada Esenalieva
Ermek Doszhanov
Towards Kyrgyz stop words
Kalbotyra
stop words
Kyrgyz language
frequency analysis
Turkic stop words
NLP
title Towards Kyrgyz stop words
title_full Towards Kyrgyz stop words
title_fullStr Towards Kyrgyz stop words
title_full_unstemmed Towards Kyrgyz stop words
title_short Towards Kyrgyz stop words
title_sort towards kyrgyz stop words
topic stop words
Kyrgyz language
frequency analysis
Turkic stop words
NLP
url https://www.zurnalai.vu.lt/kalbotyra/article/view/34292
work_keys_str_mv AT ruslanisaev towardskyrgyzstopwords
AT gulzadaesenalieva towardskyrgyzstopwords
AT ermekdoszhanov towardskyrgyzstopwords