Towards Kyrgyz stop words
The concept of stop words introduced by H. P. Lun in the mid-20th century plays a huge role in today’s NLP practice. Stop words are used to reduce noisy text data, remove uninformative words, speed up text processing, and minimize the amount of memory required to store data.The Kyrgyz language is a...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | deu |
Published: |
Vilnius University
2023-12-01
|
Series: | Kalbotyra |
Subjects: | |
Online Access: | https://www.zurnalai.vu.lt/kalbotyra/article/view/34292 |
_version_ | 1797198410583900160 |
---|---|
author | Ruslan Isaev Gulzada Esenalieva Ermek Doszhanov |
author_facet | Ruslan Isaev Gulzada Esenalieva Ermek Doszhanov |
author_sort | Ruslan Isaev |
collection | DOAJ |
description |
The concept of stop words introduced by H. P. Lun in the mid-20th century plays a huge role in today’s NLP practice. Stop words are used to reduce noisy text data, remove uninformative words, speed up text processing, and minimize the amount of memory required to store data.The Kyrgyz language is an agglutinative Turkic language for which no scientific study of stop words has been previously published in English. In our study, we combined frequency analysis with rule-based linguistic analysis. First, we found the most frequently used words, set a threshold, and removed words below the threshold. This way we got a list of the most frequently used words. Then we reduced the list by excluding from the list all words that do not belong to the category of function words of the Kyrgyz language. Finally, we got a list of 50 words that can be considered stop words in the Kyrgyz language. In our analysis, we used a single corpus of sentences collected and posted as an open source project by one of the local broadcasters.
|
first_indexed | 2024-03-07T14:22:56Z |
format | Article |
id | doaj.art-0e461890b17e4ca5af21c57315f5092a |
institution | Directory Open Access Journal |
issn | 1392-1517 2029-8315 |
language | deu |
last_indexed | 2024-04-24T06:59:25Z |
publishDate | 2023-12-01 |
publisher | Vilnius University |
record_format | Article |
series | Kalbotyra |
spelling | doaj.art-0e461890b17e4ca5af21c57315f5092a2024-04-22T08:59:18ZdeuVilnius UniversityKalbotyra1392-15172029-83152023-12-017610.15388/Kalbotyra.2023.76.4Towards Kyrgyz stop wordsRuslan Isaev0https://orcid.org/0000-0003-4426-8837Gulzada Esenalieva1https://orcid.org/0009-0000-9135-1671Ermek Doszhanov2https://orcid.org/0009-0002-4939-5683Ala-Too International University, Kyrgyz RepublicAla-Too International University, Kyrgyz RepublicAla-Too International University, Kyrgyz Republic The concept of stop words introduced by H. P. Lun in the mid-20th century plays a huge role in today’s NLP practice. Stop words are used to reduce noisy text data, remove uninformative words, speed up text processing, and minimize the amount of memory required to store data.The Kyrgyz language is an agglutinative Turkic language for which no scientific study of stop words has been previously published in English. In our study, we combined frequency analysis with rule-based linguistic analysis. First, we found the most frequently used words, set a threshold, and removed words below the threshold. This way we got a list of the most frequently used words. Then we reduced the list by excluding from the list all words that do not belong to the category of function words of the Kyrgyz language. Finally, we got a list of 50 words that can be considered stop words in the Kyrgyz language. In our analysis, we used a single corpus of sentences collected and posted as an open source project by one of the local broadcasters. https://www.zurnalai.vu.lt/kalbotyra/article/view/34292stop wordsKyrgyz languagefrequency analysisTurkic stop wordsNLP |
spellingShingle | Ruslan Isaev Gulzada Esenalieva Ermek Doszhanov Towards Kyrgyz stop words Kalbotyra stop words Kyrgyz language frequency analysis Turkic stop words NLP |
title | Towards Kyrgyz stop words |
title_full | Towards Kyrgyz stop words |
title_fullStr | Towards Kyrgyz stop words |
title_full_unstemmed | Towards Kyrgyz stop words |
title_short | Towards Kyrgyz stop words |
title_sort | towards kyrgyz stop words |
topic | stop words Kyrgyz language frequency analysis Turkic stop words NLP |
url | https://www.zurnalai.vu.lt/kalbotyra/article/view/34292 |
work_keys_str_mv | AT ruslanisaev towardskyrgyzstopwords AT gulzadaesenalieva towardskyrgyzstopwords AT ermekdoszhanov towardskyrgyzstopwords |