KNOWLEDGE TRANSFER FOR RUSSIAN CONVERSATIONAL TELEPHONE AUTOMATIC SPEECH RECOGNITION

This paper describes the method of knowledge transfer between the ensemble of neural network acoustic models and student-network. This method is used to reduce computational costs and improve the quality of the speech recognition system. The experiments consider two variants of generation of class l...

Full description

Bibliographic Details
Main Authors: A. N. Romanenko, Y. N. Matveev, W. Minker
Format: Article
Language:English
Published: Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University) 2018-03-01
Series:Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
Subjects:
Online Access:http://ntv.ifmo.ru/file/article/17617.pdf
_version_ 1819035200619806720
author A. N. Romanenko
Y. N. Matveev
W. Minker
author_facet A. N. Romanenko
Y. N. Matveev
W. Minker
author_sort A. N. Romanenko
collection DOAJ
description This paper describes the method of knowledge transfer between the ensemble of neural network acoustic models and student-network. This method is used to reduce computational costs and improve the quality of the speech recognition system. The experiments consider two variants of generation of class labels from the ensemble of models: interpolation with alignment, and the posteriori probabilities. Also, the quality of models was studied in relation with the smoothing coefficient. This coefficient was built into the output log-linear classifier of the neural network (softmax layer) and was used both in the ensemble and in the student-network. Additionally, the initial and final learning rates were analyzed. We were successful in relationship establishing between the usage of the smoothing coefficient for generation of the posteriori probabilities and the parameters of the learning rate. Finally, the application of the knowledge transfer for the automatic recognition of Russian conversational telephone speech gave the possibility to reduce the WER (Word Error Rate) by 2.49%, in comparison with the model trained on alignment from the ensemble of neural networks.
first_indexed 2024-12-21T07:45:51Z
format Article
id doaj.art-2a7ae61f0fe140c7a2024f6558947f14
institution Directory Open Access Journal
issn 2226-1494
2500-0373
language English
last_indexed 2024-12-21T07:45:51Z
publishDate 2018-03-01
publisher Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)
record_format Article
series Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
spelling doaj.art-2a7ae61f0fe140c7a2024f6558947f142022-12-21T19:11:12ZengSaint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki2226-14942500-03732018-03-0118223624210.17586/2226-1494-2018-18-2-236-242KNOWLEDGE TRANSFER FOR RUSSIAN CONVERSATIONAL TELEPHONE AUTOMATIC SPEECH RECOGNITIONA. N. RomanenkoY. N. MatveevW. MinkerThis paper describes the method of knowledge transfer between the ensemble of neural network acoustic models and student-network. This method is used to reduce computational costs and improve the quality of the speech recognition system. The experiments consider two variants of generation of class labels from the ensemble of models: interpolation with alignment, and the posteriori probabilities. Also, the quality of models was studied in relation with the smoothing coefficient. This coefficient was built into the output log-linear classifier of the neural network (softmax layer) and was used both in the ensemble and in the student-network. Additionally, the initial and final learning rates were analyzed. We were successful in relationship establishing between the usage of the smoothing coefficient for generation of the posteriori probabilities and the parameters of the learning rate. Finally, the application of the knowledge transfer for the automatic recognition of Russian conversational telephone speech gave the possibility to reduce the WER (Word Error Rate) by 2.49%, in comparison with the model trained on alignment from the ensemble of neural networks.http://ntv.ifmo.ru/file/article/17617.pdfknowledge transfersmoothing coefficientsoftmaxautomatic speech recognitionensemble of neural networksstudent-networkconversational telephone speech
spellingShingle A. N. Romanenko
Y. N. Matveev
W. Minker
KNOWLEDGE TRANSFER FOR RUSSIAN CONVERSATIONAL TELEPHONE AUTOMATIC SPEECH RECOGNITION
Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
knowledge transfer
smoothing coefficient
softmax
automatic speech recognition
ensemble of neural networks
student-network
conversational telephone speech
title KNOWLEDGE TRANSFER FOR RUSSIAN CONVERSATIONAL TELEPHONE AUTOMATIC SPEECH RECOGNITION
title_full KNOWLEDGE TRANSFER FOR RUSSIAN CONVERSATIONAL TELEPHONE AUTOMATIC SPEECH RECOGNITION
title_fullStr KNOWLEDGE TRANSFER FOR RUSSIAN CONVERSATIONAL TELEPHONE AUTOMATIC SPEECH RECOGNITION
title_full_unstemmed KNOWLEDGE TRANSFER FOR RUSSIAN CONVERSATIONAL TELEPHONE AUTOMATIC SPEECH RECOGNITION
title_short KNOWLEDGE TRANSFER FOR RUSSIAN CONVERSATIONAL TELEPHONE AUTOMATIC SPEECH RECOGNITION
title_sort knowledge transfer for russian conversational telephone automatic speech recognition
topic knowledge transfer
smoothing coefficient
softmax
automatic speech recognition
ensemble of neural networks
student-network
conversational telephone speech
url http://ntv.ifmo.ru/file/article/17617.pdf
work_keys_str_mv AT anromanenko knowledgetransferforrussianconversationaltelephoneautomaticspeechrecognition
AT ynmatveev knowledgetransferforrussianconversationaltelephoneautomaticspeechrecognition
AT wminker knowledgetransferforrussianconversationaltelephoneautomaticspeechrecognition