GAUSSIAN MIXTURE MODELS FOR ADAPTATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS IN AUTOMATIC SPEECH RECOGNITION SYSTEMS

Subject of Research. We study speaker adaptation of deep neural network (DNN) acoustic models in automatic speech recognition systems. The aim of speaker adaptation techniques is to improve the accuracy of the speech recognition system for a particular speaker. Method. A novel method for training an...

Full description

Bibliographic Details
Main Authors:	Natalia A. Tomashenko, Yuri Yu. Khokhlov, Anthony Larcher, Yannick Estève, Yuri N. Matveev
Format:	Article
Language:	English
Published:	Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University) 2016-11-01
Series:	Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
Subjects:	automatic speech recognition (ASR) acoustic models speaker adaptation deep neural networks (DNN) GMM-derived features GMMD maximum a posteriori (MAP) fMLLR GMM acoustic model adaptation fusion
Online Access:	http://ntv.ifmo.ru/file/article/16176.pdf

_version_	1819090396893937664
author	Natalia A. Tomashenko Yuri Yu. Khokhlov Anthony Larcher Yannick Estève Yuri N. Matveev
author_facet	Natalia A. Tomashenko Yuri Yu. Khokhlov Anthony Larcher Yannick Estève Yuri N. Matveev
author_sort	Natalia A. Tomashenko
collection	DOAJ
description	Subject of Research. We study speaker adaptation of deep neural network (DNN) acoustic models in automatic speech recognition systems. The aim of speaker adaptation techniques is to improve the accuracy of the speech recognition system for a particular speaker. Method. A novel method for training and adaptation of deep neural network acoustic models has been developed. It is based on using an auxiliary GMM (Gaussian Mixture Models) model and GMMD (GMM-derived) features. The principle advantage of the proposed GMMD features is the possibility of performing the adaptation of a DNN through the adaptation of the auxiliary GMM. In the proposed approach any methods for the adaptation of the auxiliary GMM can be used, hence, it provides a universal method for transferring adaptation algorithms developed for GMMs to DNN adaptation.Main Results. The effectiveness of the proposed approach was shown by means of one of the most common adaptation algorithms for GMM models – MAP (Maximum A Posteriori) adaptation. Different ways of integration of the proposed approach into state-of-the-art DNN architecture have been proposed and explored. Analysis of choosing the type of the auxiliary GMM model is given. Experimental results on the TED-LIUM corpus demonstrate that, in an unsupervised adaptation mode, the proposed adaptation technique can provide, approximately, a 11–18% relative word error reduction (WER) on different adaptation sets, compared to the speaker-independent DNN system built on conventional features, and a 3–6% relative WER reduction compared to the SAT-DNN trained on fMLLR adapted features.
first_indexed	2024-12-21T22:23:10Z
format	Article
id	doaj.art-bf005eb246874a83acffa1b6722b35c8
institution	Directory Open Access Journal
issn	2226-1494 2500-0373
language	English
last_indexed	2024-12-21T22:23:10Z
publishDate	2016-11-01
publisher	Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)
record_format	Article
series	Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
spelling	doaj.art-bf005eb246874a83acffa1b6722b35c82022-12-21T18:48:17ZengSaint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki2226-14942500-03732016-11-011661063107210.17586/2226-1494-2016-16-6-1063-1072GAUSSIAN MIXTURE MODELS FOR ADAPTATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS IN AUTOMATIC SPEECH RECOGNITION SYSTEMSNatalia A. Tomashenko0Yuri Yu. Khokhlov1Anthony Larcher2Yannick Estève 3Yuri N. Matveev4postgraduate, Laboratory of Computer Science of the University of Le Mans (LIUM), Le Mans, 72085, France; researcher, “STC-Innovation”, Ltd., Saint Petersburg, 196084, Russian Federation; postgraduate, ITMO University, Saint Petersburg, 197101, Russian Federation– leading programmer, "STC-Innovations", Ltd., Saint Petersburg, 196084, Russian FederationPhD, Associate professor, Laboratory of Computer Science of the University of Le Mans (LIUM), Le Mans, 72085, FranceD.Sc., Professor, Director, Laboratory of Computer Science of the University of Le Mans (LIUM), Le Mans, 72085, FranceD.Sc., Chief scientific researcher, “STCInnovation”, Ltd., Saint Petersburg, 196084, Russian Federation; Head of Chair, ITMO University, Saint Petersburg, 197101, Russian FederationSubject of Research. We study speaker adaptation of deep neural network (DNN) acoustic models in automatic speech recognition systems. The aim of speaker adaptation techniques is to improve the accuracy of the speech recognition system for a particular speaker. Method. A novel method for training and adaptation of deep neural network acoustic models has been developed. It is based on using an auxiliary GMM (Gaussian Mixture Models) model and GMMD (GMM-derived) features. The principle advantage of the proposed GMMD features is the possibility of performing the adaptation of a DNN through the adaptation of the auxiliary GMM. In the proposed approach any methods for the adaptation of the auxiliary GMM can be used, hence, it provides a universal method for transferring adaptation algorithms developed for GMMs to DNN adaptation.Main Results. The effectiveness of the proposed approach was shown by means of one of the most common adaptation algorithms for GMM models – MAP (Maximum A Posteriori) adaptation. Different ways of integration of the proposed approach into state-of-the-art DNN architecture have been proposed and explored. Analysis of choosing the type of the auxiliary GMM model is given. Experimental results on the TED-LIUM corpus demonstrate that, in an unsupervised adaptation mode, the proposed adaptation technique can provide, approximately, a 11–18% relative word error reduction (WER) on different adaptation sets, compared to the speaker-independent DNN system built on conventional features, and a 3–6% relative WER reduction compared to the SAT-DNN trained on fMLLR adapted features.http://ntv.ifmo.ru/file/article/16176.pdfautomatic speech recognition (ASR)acoustic modelsspeaker adaptationdeep neural networks (DNN)GMM-derived featuresGMMDmaximum a posteriori (MAP)fMLLRGMMacoustic model adaptationfusion
spellingShingle	Natalia A. Tomashenko Yuri Yu. Khokhlov Anthony Larcher Yannick Estève Yuri N. Matveev GAUSSIAN MIXTURE MODELS FOR ADAPTATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS IN AUTOMATIC SPEECH RECOGNITION SYSTEMS Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki automatic speech recognition (ASR) acoustic models speaker adaptation deep neural networks (DNN) GMM-derived features GMMD maximum a posteriori (MAP) fMLLR GMM acoustic model adaptation fusion
title	GAUSSIAN MIXTURE MODELS FOR ADAPTATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS IN AUTOMATIC SPEECH RECOGNITION SYSTEMS
title_full	GAUSSIAN MIXTURE MODELS FOR ADAPTATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS IN AUTOMATIC SPEECH RECOGNITION SYSTEMS
title_fullStr	GAUSSIAN MIXTURE MODELS FOR ADAPTATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS IN AUTOMATIC SPEECH RECOGNITION SYSTEMS
title_full_unstemmed	GAUSSIAN MIXTURE MODELS FOR ADAPTATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS IN AUTOMATIC SPEECH RECOGNITION SYSTEMS
title_short	GAUSSIAN MIXTURE MODELS FOR ADAPTATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS IN AUTOMATIC SPEECH RECOGNITION SYSTEMS
title_sort	gaussian mixture models for adaptation of deep neural network acoustic models in automatic speech recognition systems
topic	automatic speech recognition (ASR) acoustic models speaker adaptation deep neural networks (DNN) GMM-derived features GMMD maximum a posteriori (MAP) fMLLR GMM acoustic model adaptation fusion
url	http://ntv.ifmo.ru/file/article/16176.pdf
work_keys_str_mv	AT nataliaatomashenko gaussianmixturemodelsforadaptationofdeepneuralnetworkacousticmodelsinautomaticspeechrecognitionsystems AT yuriyukhokhlov gaussianmixturemodelsforadaptationofdeepneuralnetworkacousticmodelsinautomaticspeechrecognitionsystems AT anthonylarcher gaussianmixturemodelsforadaptationofdeepneuralnetworkacousticmodelsinautomaticspeechrecognitionsystems AT yannickesteve gaussianmixturemodelsforadaptationofdeepneuralnetworkacousticmodelsinautomaticspeechrecognitionsystems AT yurinmatveev gaussianmixturemodelsforadaptationofdeepneuralnetworkacousticmodelsinautomaticspeechrecognitionsystems

GAUSSIAN MIXTURE MODELS FOR ADAPTATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS IN AUTOMATIC SPEECH RECOGNITION SYSTEMS

Similar Items