Word Game Modeling Using Character-Level N-Gram and Statistics

Word games are one of the most essential factors of vocabulary learning and matching letters to form words for children aged 5–12. These games help children to improve letter and word recognition, memory-building, and vocabulary retention skills. Since Uzbek is a low-resource language, there has not...

Full description

Bibliographic Details
Main Authors:	Jamolbek Mattiev, Ulugbek Salaev, Branko Kavsek
Format:	Article
Language:	English
Published:	MDPI AG 2023-03-01
Series:	Mathematics
Subjects:	word game modeling letter frequency character-level N-gram model coverage statistics
Online Access:	https://www.mdpi.com/2227-7390/11/6/1380

_version_	1827748911030206464
author	Jamolbek Mattiev Ulugbek Salaev Branko Kavsek
author_facet	Jamolbek Mattiev Ulugbek Salaev Branko Kavsek
author_sort	Jamolbek Mattiev
collection	DOAJ
description	Word games are one of the most essential factors of vocabulary learning and matching letters to form words for children aged 5–12. These games help children to improve letter and word recognition, memory-building, and vocabulary retention skills. Since Uzbek is a low-resource language, there has not been enough research into designing word games for the Uzbek language. In this paper, we develop two models for designing the cubic-letter game, also known as the matching-letter game, in the Uzbek language, consisting of a predefined number of cubes, with a letter on each side of each six-sided cube, and word cards to form words using a combination of the cubes. More precisely, we provide the opportunity to form as many words as possible from the dataset, while minimizing the number of cubes. The proposed methods were created using a combination of a character-level n-gram model and letter position frequency in words at the level of vowels and consonants. To perform the experiments, a novel dataset, consisting of 4.5 k 3–5 letter words, was created by filtering based on child age groups for the Uzbek language, and three more datasets were generated, based on the support of experts for the Russian, English, and Slovenian languages. Experimental evaluations showed that both models achieved good results in terms of average coverage. In particular, the Vowel Priority (<i>VL</i>) approach obtained reasonably high coverage with 95.9% in Uzbek, 96.8% in English, and 94.2% in the Slovenian language in the case of eight cubes, based on the five-fold cross-validation method. Both models covered around 85% of five letter words in Uzbek, English, and Slovenian datasets, while this coverage was even higher (99%) in three letter words in the case of eight cubes.
first_indexed	2024-03-11T06:13:35Z
format	Article
id	doaj.art-a76d15fac2ea4b05aed9ef61dc8dfb54
institution	Directory Open Access Journal
issn	2227-7390
language	English
last_indexed	2024-03-11T06:13:35Z
publishDate	2023-03-01
publisher	MDPI AG
record_format	Article
series	Mathematics
spelling	doaj.art-a76d15fac2ea4b05aed9ef61dc8dfb542023-11-17T12:27:46ZengMDPI AGMathematics2227-73902023-03-01116138010.3390/math11061380Word Game Modeling Using Character-Level N-Gram and StatisticsJamolbek Mattiev0Ulugbek Salaev1Branko Kavsek2Information Technologies Department, Urgench State University, Khamid Alimdjan 14, Urgench 220100, UzbekistanInformation Technologies Department, Urgench State University, Khamid Alimdjan 14, Urgench 220100, UzbekistanDepartment of Information Sciences and Technologies, University of Primorska, Glagoljaška 8, 6000 Koper, SloveniaWord games are one of the most essential factors of vocabulary learning and matching letters to form words for children aged 5–12. These games help children to improve letter and word recognition, memory-building, and vocabulary retention skills. Since Uzbek is a low-resource language, there has not been enough research into designing word games for the Uzbek language. In this paper, we develop two models for designing the cubic-letter game, also known as the matching-letter game, in the Uzbek language, consisting of a predefined number of cubes, with a letter on each side of each six-sided cube, and word cards to form words using a combination of the cubes. More precisely, we provide the opportunity to form as many words as possible from the dataset, while minimizing the number of cubes. The proposed methods were created using a combination of a character-level n-gram model and letter position frequency in words at the level of vowels and consonants. To perform the experiments, a novel dataset, consisting of 4.5 k 3–5 letter words, was created by filtering based on child age groups for the Uzbek language, and three more datasets were generated, based on the support of experts for the Russian, English, and Slovenian languages. Experimental evaluations showed that both models achieved good results in terms of average coverage. In particular, the Vowel Priority (<i>VL</i>) approach obtained reasonably high coverage with 95.9% in Uzbek, 96.8% in English, and 94.2% in the Slovenian language in the case of eight cubes, based on the five-fold cross-validation method. Both models covered around 85% of five letter words in Uzbek, English, and Slovenian datasets, while this coverage was even higher (99%) in three letter words in the case of eight cubes.https://www.mdpi.com/2227-7390/11/6/1380word game modelingletter frequencycharacter-level N-grammodel coveragestatistics
spellingShingle	Jamolbek Mattiev Ulugbek Salaev Branko Kavsek Word Game Modeling Using Character-Level N-Gram and Statistics Mathematics word game modeling letter frequency character-level N-gram model coverage statistics
title	Word Game Modeling Using Character-Level N-Gram and Statistics
title_full	Word Game Modeling Using Character-Level N-Gram and Statistics
title_fullStr	Word Game Modeling Using Character-Level N-Gram and Statistics
title_full_unstemmed	Word Game Modeling Using Character-Level N-Gram and Statistics
title_short	Word Game Modeling Using Character-Level N-Gram and Statistics
title_sort	word game modeling using character level n gram and statistics
topic	word game modeling letter frequency character-level N-gram model coverage statistics
url	https://www.mdpi.com/2227-7390/11/6/1380
work_keys_str_mv	AT jamolbekmattiev wordgamemodelingusingcharacterlevelngramandstatistics AT ulugbeksalaev wordgamemodelingusingcharacterlevelngramandstatistics AT brankokavsek wordgamemodelingusingcharacterlevelngramandstatistics

Word Game Modeling Using Character-Level N-Gram and Statistics

Similar Items