Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language

Communication has been an important aspect of human life, civilization, and globalization for thousands of years. Biometric analysis, education, security, healthcare, and smart cities are only a few examples of speech recognition applications. Most studies have mainly concentrated on English, Spanis...

Full description

Bibliographic Details
Main Authors:	Abdinabi Mukhamadiyev, Ilyos Khujayarov, Oybek Djuraev, Jinsoo Cho
Format:	Article
Language:	English
Published:	MDPI AG 2022-05-01
Series:	Sensors
Subjects:	convolutional neural network end-to-end speech recognition transformers CTC-attention Uzbek language deep learning
Online Access:	https://www.mdpi.com/1424-8220/22/10/3683

_version_	1827666686134714368
author	Abdinabi Mukhamadiyev Ilyos Khujayarov Oybek Djuraev Jinsoo Cho
author_facet	Abdinabi Mukhamadiyev Ilyos Khujayarov Oybek Djuraev Jinsoo Cho
author_sort	Abdinabi Mukhamadiyev
collection	DOAJ
description	Communication has been an important aspect of human life, civilization, and globalization for thousands of years. Biometric analysis, education, security, healthcare, and smart cities are only a few examples of speech recognition applications. Most studies have mainly concentrated on English, Spanish, Japanese, or Chinese, disregarding other low-resource languages, such as Uzbek, leaving their analysis open. In this paper, we propose an End-To-End Deep Neural Network-Hidden Markov Model speech recognition model and a hybrid Connectionist Temporal Classification (CTC)-attention network for the Uzbek language and its dialects. The proposed approach reduces training time and improves speech recognition accuracy by effectively using CTC objective function in attention model training. We evaluated the linguistic and lay-native speaker performances on the Uzbek language dataset, which was collected as a part of this study. Experimental results show that the proposed model achieved a word error rate of 14.3% using 207 h of recordings as an Uzbek language training dataset.
first_indexed	2024-03-10T01:55:11Z
format	Article
id	doaj.art-a252ecac5001421f925c3344425a9384
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-10T01:55:11Z
publishDate	2022-05-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-a252ecac5001421f925c3344425a93842023-11-23T12:59:23ZengMDPI AGSensors1424-82202022-05-012210368310.3390/s22103683Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek LanguageAbdinabi Mukhamadiyev0Ilyos Khujayarov1Oybek Djuraev2Jinsoo Cho3Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 13120, KoreaDepartment of Information Technologies, Samarkand Branch of Tashkent University of Information Technologies Named after Muhammad al-Khwarizmi, Tashkent 140100, UzbekistanDepartment of Hardware and Software of Control Systems in Telecommunication, Tashkent University of Information Technologies Named after Muhammad al-Khwarizmi, Tashkent 100084, UzbekistanDepartment of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 13120, KoreaCommunication has been an important aspect of human life, civilization, and globalization for thousands of years. Biometric analysis, education, security, healthcare, and smart cities are only a few examples of speech recognition applications. Most studies have mainly concentrated on English, Spanish, Japanese, or Chinese, disregarding other low-resource languages, such as Uzbek, leaving their analysis open. In this paper, we propose an End-To-End Deep Neural Network-Hidden Markov Model speech recognition model and a hybrid Connectionist Temporal Classification (CTC)-attention network for the Uzbek language and its dialects. The proposed approach reduces training time and improves speech recognition accuracy by effectively using CTC objective function in attention model training. We evaluated the linguistic and lay-native speaker performances on the Uzbek language dataset, which was collected as a part of this study. Experimental results show that the proposed model achieved a word error rate of 14.3% using 207 h of recordings as an Uzbek language training dataset.https://www.mdpi.com/1424-8220/22/10/3683convolutional neural networkend-to-end speech recognitiontransformersCTC-attentionUzbek languagedeep learning
spellingShingle	Abdinabi Mukhamadiyev Ilyos Khujayarov Oybek Djuraev Jinsoo Cho Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language Sensors convolutional neural network end-to-end speech recognition transformers CTC-attention Uzbek language deep learning
title	Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language
title_full	Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language
title_fullStr	Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language
title_full_unstemmed	Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language
title_short	Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language
title_sort	automatic speech recognition method based on deep learning approaches for uzbek language
topic	convolutional neural network end-to-end speech recognition transformers CTC-attention Uzbek language deep learning
url	https://www.mdpi.com/1424-8220/22/10/3683
work_keys_str_mv	AT abdinabimukhamadiyev automaticspeechrecognitionmethodbasedondeeplearningapproachesforuzbeklanguage AT ilyoskhujayarov automaticspeechrecognitionmethodbasedondeeplearningapproachesforuzbeklanguage AT oybekdjuraev automaticspeechrecognitionmethodbasedondeeplearningapproachesforuzbeklanguage AT jinsoocho automaticspeechrecognitionmethodbasedondeeplearningapproachesforuzbeklanguage

Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language

Similar Items