Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language

Communication has been an important aspect of human life, civilization, and globalization for thousands of years. Biometric analysis, education, security, healthcare, and smart cities are only a few examples of speech recognition applications. Most studies have mainly concentrated on English, Spanis...

Full description

Bibliographic Details
Main Authors: Abdinabi Mukhamadiyev, Ilyos Khujayarov, Oybek Djuraev, Jinsoo Cho
Format: Article
Language:English
Published: MDPI AG 2022-05-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/22/10/3683
_version_ 1827666686134714368
author Abdinabi Mukhamadiyev
Ilyos Khujayarov
Oybek Djuraev
Jinsoo Cho
author_facet Abdinabi Mukhamadiyev
Ilyos Khujayarov
Oybek Djuraev
Jinsoo Cho
author_sort Abdinabi Mukhamadiyev
collection DOAJ
description Communication has been an important aspect of human life, civilization, and globalization for thousands of years. Biometric analysis, education, security, healthcare, and smart cities are only a few examples of speech recognition applications. Most studies have mainly concentrated on English, Spanish, Japanese, or Chinese, disregarding other low-resource languages, such as Uzbek, leaving their analysis open. In this paper, we propose an End-To-End Deep Neural Network-Hidden Markov Model speech recognition model and a hybrid Connectionist Temporal Classification (CTC)-attention network for the Uzbek language and its dialects. The proposed approach reduces training time and improves speech recognition accuracy by effectively using CTC objective function in attention model training. We evaluated the linguistic and lay-native speaker performances on the Uzbek language dataset, which was collected as a part of this study. Experimental results show that the proposed model achieved a word error rate of 14.3% using 207 h of recordings as an Uzbek language training dataset.
first_indexed 2024-03-10T01:55:11Z
format Article
id doaj.art-a252ecac5001421f925c3344425a9384
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-10T01:55:11Z
publishDate 2022-05-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-a252ecac5001421f925c3344425a93842023-11-23T12:59:23ZengMDPI AGSensors1424-82202022-05-012210368310.3390/s22103683Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek LanguageAbdinabi Mukhamadiyev0Ilyos Khujayarov1Oybek Djuraev2Jinsoo Cho3Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 13120, KoreaDepartment of Information Technologies, Samarkand Branch of Tashkent University of Information Technologies Named after Muhammad al-Khwarizmi, Tashkent 140100, UzbekistanDepartment of Hardware and Software of Control Systems in Telecommunication, Tashkent University of Information Technologies Named after Muhammad al-Khwarizmi, Tashkent 100084, UzbekistanDepartment of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 13120, KoreaCommunication has been an important aspect of human life, civilization, and globalization for thousands of years. Biometric analysis, education, security, healthcare, and smart cities are only a few examples of speech recognition applications. Most studies have mainly concentrated on English, Spanish, Japanese, or Chinese, disregarding other low-resource languages, such as Uzbek, leaving their analysis open. In this paper, we propose an End-To-End Deep Neural Network-Hidden Markov Model speech recognition model and a hybrid Connectionist Temporal Classification (CTC)-attention network for the Uzbek language and its dialects. The proposed approach reduces training time and improves speech recognition accuracy by effectively using CTC objective function in attention model training. We evaluated the linguistic and lay-native speaker performances on the Uzbek language dataset, which was collected as a part of this study. Experimental results show that the proposed model achieved a word error rate of 14.3% using 207 h of recordings as an Uzbek language training dataset.https://www.mdpi.com/1424-8220/22/10/3683convolutional neural networkend-to-end speech recognitiontransformersCTC-attentionUzbek languagedeep learning
spellingShingle Abdinabi Mukhamadiyev
Ilyos Khujayarov
Oybek Djuraev
Jinsoo Cho
Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language
Sensors
convolutional neural network
end-to-end speech recognition
transformers
CTC-attention
Uzbek language
deep learning
title Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language
title_full Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language
title_fullStr Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language
title_full_unstemmed Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language
title_short Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language
title_sort automatic speech recognition method based on deep learning approaches for uzbek language
topic convolutional neural network
end-to-end speech recognition
transformers
CTC-attention
Uzbek language
deep learning
url https://www.mdpi.com/1424-8220/22/10/3683
work_keys_str_mv AT abdinabimukhamadiyev automaticspeechrecognitionmethodbasedondeeplearningapproachesforuzbeklanguage
AT ilyoskhujayarov automaticspeechrecognitionmethodbasedondeeplearningapproachesforuzbeklanguage
AT oybekdjuraev automaticspeechrecognitionmethodbasedondeeplearningapproachesforuzbeklanguage
AT jinsoocho automaticspeechrecognitionmethodbasedondeeplearningapproachesforuzbeklanguage