Dataset of stopwords extracted from Uzbek texts

Dataset of stopwords extracted from Uzbek texts

Filtering stop words is an important task when processing text queries to search for information in large data sets. It enables a reduction of the search space without losing the semantic meaning. The stop words, which have only grammatical roles and not contributing to information content still add...

Full description

Bibliographic Details
Main Authors:	Khabibulla Madatov, Shukurla Bekchanov, Jernej Vičič
Format:	Article
Language:	English
Published:	Elsevier 2022-08-01
Series:	Data in Brief
Subjects:	Stop words Machine Learning Unigram Bigram Collocation
Online Access:	http://www.sciencedirect.com/science/article/pii/S2352340922005522

Similar Items

Dataset of Karakalpak language stop words
by: Khabibulla Madatov, et al.
Published: (2023-06-01)

Text Mining technologies in sociological analysis (using the example of studying students`ideas about the mission of a modern university)
by: Antonina N. Pinchuk, et al.
Published: (2024-03-01)

Dataset for Analysis of Russian-Language Reviews on MOOCs Extracted from Stepik
by: Yulia Dyulicheva
Published: (2022-12-01)

STRUCTURAL PECULIARITIES OF BIGRAM-COLLOCATIONS IN LEGAL ENGLISH
by: Ol’ga M. Litvishko
Published: (2019-08-01)

Classifying cuneiform symbols using machine learning algorithms with unigram features on a balanced dataset
by: Mahmood Maha, et al.
Published: (2023-09-01)

Parallel texts dataset for Uzbek-Kazakh machine translation
by: Bobur Allaberdiev, et al.
Published: (2024-04-01)

Mind the gap: Towards determining which collocations to teach
by: Nizonkiza, Déogratias, et al.
Published: (2019-03-01)

Terminological Collocation as Object under Study
by: I. O. Onal
Published: (2019-01-01)

Towards Kyrgyz stop words
by: Ruslan Isaev, et al.
Published: (2023-12-01)

Updating the dictionary: Semantic change identification based on change in bigrams over time
by: Sanni Nimb, et al.
Published: (2020-08-01)

Automatic Multilingual Stopwords Identification from Very Small Corpora
by: Stefano Ferilli
Published: (2021-09-01)

The Entropy of Words—Learnability and Expressivity across More than 1000 Languages
by: Christian Bentz, et al.
Published: (2017-06-01)

An Effective Preprocessing Step Algorithm in Text Mining Application
by: R. M. Hadi, et al.
Published: (2017-02-01)

The Functions of the Word "Get" in Texts
by: C. Sutarsyah
Published: (2015-08-01)

The Relation Dimension in the Identification and Classification of Lexically Restricted Word Co-Occurrences in Text Corpora
by: Alexander Shvets, et al.
Published: (2022-10-01)

Impact of Negation and AnA-Words on Overall Sentiment Value of the Text Written in the Bosnian Language
by: Sead Jahić, et al.
Published: (2023-06-01)

First-year university students’ productive knowledge of collocations
by: Nizonkiza, Déogratias, et al.
Published: (2013-12-01)

Collocation of word „social“
by: Rudolf Šrámek
Published: (2005-01-01)

A Generalized Approach to Keyphrase Extraction using Extended Lists of Stop Words
by: Svetlana Popova, et al.
Published: (2017-11-01)

PRIMARY AND SECONDARY FUNCTIONS OF WORD CLASES IN UZBEK LANGUAGE
by: Adiba BOTIROVA
Published: (2018-06-01)

A High Efficient Biological Language Model for Predicting Protein–Protein Interactions
by: Yanbin Wang, et al.
Published: (2019-02-01)

A Natural Language Processing Approach to Automated Highlighting of New Information in Clinical Notes
by: Yu-Hsiang Su, et al.
Published: (2020-04-01)

A Bi-Gram Approach for an Exhaustive Arabic Triliteral Roots Lexicon
by: Ebtihal Mustafa, et al.
Published: (2023-03-01)

SumSec: Accurate Prediction of Sumoylation Sites Using Predicted Secondary Structure
by: Abdollah Dehzangi, et al.
Published: (2018-12-01)

The SAFE procedure: a practical stopping heuristic for active learning-based screening in systematic reviews and meta-analyses
by: Josien Boetje, et al.
Published: (2024-03-01)

Kommunikation ÜBER / PER / VIA Internet, Chat, SMS und E-Mail. Wie lassen sich diese digitalen Wege ausdrücken?
by: Monika Hornáček Banášová, et al.
Published: (2023-07-01)

Comparison of Unigram, HMM, CRF and Brill's Part-of-Speech Taggers Available in NLTK Library
by: Michal Kvet, et al.
Published: (2023-05-01)

Authorial compatibility of words in I.A. Goncharov’s novels: comparative stylometry
by: M. Yu. Mukhin, et al.
Published: (2020-12-01)

AUTHOR’S THESAURUS AND WORD COMPATIBILITY: LEXICOSTATISTICAL MODELS OF INDIVIDUAL STYLE
Published: (2019-02-01)

Analisis Komparatif Pengukuran Kemiripan Artikel Ilmiah menggunakan Jaccard dan Levenshtein serta Blocking
by: Muhammad Rizqi Nur, et al.
Published: (2023-08-01)

ABDULLA KADIRI UZBEK LITERARY LANGUAGE AND PRESS DEVELOPMENT SERVICES
by: Murodqosim ABDIYEV
Published: (2020-06-01)

The Students’ Perceptions of the Involvement Load Hypothesis on Collocation Learning
by: Nathaya Un-udom, et al.
Published: (2021-12-01)

Tracing Trends and Patterns of IKN Words in Media and Twitter: A Linguistic Corpus Study (Menelusuri Tren dan Pola Kata IKN di Media dan Twitter: Kajian Korpus Linguisti)
by: Devi Ambarwati Puspitasari, et al.
Published: (2023-10-01)

Japanese Word Sketches: Advances and Problems
by: Irena SRDANOVIĆ, et al.
Published: (2011-10-01)

Olfactory vocabulary and collocation in French
by: Henry Tyne
Published: (2017-12-01)

Evaluasi Daftar Stopword Bahasa Indonesia
by: Faisal Rahutomo, et al.
Published: (2019-01-01)

Stopword Dinamis dengan Pendekatan Statistik
by: Mardi Siswo Utomo
Published: (2015-12-01)

DoubleStrokeNet: Bigram-Level Keystroke Authentication
by: Teodor Neacsu, et al.
Published: (2023-10-01)

Collocational Competence in English Language Teaching Among SS III Students of Nigerian Turkish International College, Abuja, Nigeria
by: Bello Yekeen
Published: (2023-06-01)

An Overview of Collocations in Saeb Tabrizi's Sonnets
by: Mahla Arianpour, et al.
Published: (2023-12-01)