Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition

Spectral information represents short-term speech information within a frame of a few tens of milliseconds, while temporal information captures the evolution of speech statistics over consecutive frames. Motivated by the findings that human speech comprehension relies on the integrity of both the sp...

Full description

Bibliographic Details
Main Authors:	Nguyen, Duc Hoang Ha, Xiao, Xiong, Chng, Eng Siong, Li, Haizhou
Other Authors:	School of Computer Science and Engineering
Format:	Journal Article
Language:	English
Published:	2016
Subjects:	Feature adaptation Temporal filtering
Online Access:	https://hdl.handle.net/10356/84664 http://hdl.handle.net/10220/41916

_version_	1811681489311498240
author	Nguyen, Duc Hoang Ha Xiao, Xiong Chng, Eng Siong Li, Haizhou
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Nguyen, Duc Hoang Ha Xiao, Xiong Chng, Eng Siong Li, Haizhou
author_sort	Nguyen, Duc Hoang Ha
collection	NTU
description	Spectral information represents short-term speech information within a frame of a few tens of milliseconds, while temporal information captures the evolution of speech statistics over consecutive frames. Motivated by the findings that human speech comprehension relies on the integrity of both the spectral content and temporal envelope of speech signal, we study a spectro-temporal transform framework that adapts run-time speech features to minimize the mismatch between run-time and training data, and its implementation that includes cross transform and cascaded transform. A Kullback-Leibler divergence based cost function is proposed to estimate the transform parameters. We conducted experiments on the REVERB Challenge 2014 task, where clean and multi-condition trained acoustic models are tested with real reverberant and noisy speech. We found that temporal information is important for reverberant speech recognition and the simultaneous use of spectral and temporal information for feature adaptation is effective. We also investigate the combination of the cross transform with fMLLR, the combination of batch, utterance and speaker mode adaptation, and multicondition adaptive training using proposed transforms. All experiments consistently report significant word error rate reductions.
first_indexed	2024-10-01T03:41:45Z
format	Journal Article
id	ntu-10356/84664
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T03:41:45Z
publishDate	2016
record_format	dspace
spelling	ntu-10356/846642020-03-07T11:48:57Z Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition Nguyen, Duc Hoang Ha Xiao, Xiong Chng, Eng Siong Li, Haizhou School of Computer Science and Engineering Feature adaptation Temporal filtering Spectral information represents short-term speech information within a frame of a few tens of milliseconds, while temporal information captures the evolution of speech statistics over consecutive frames. Motivated by the findings that human speech comprehension relies on the integrity of both the spectral content and temporal envelope of speech signal, we study a spectro-temporal transform framework that adapts run-time speech features to minimize the mismatch between run-time and training data, and its implementation that includes cross transform and cascaded transform. A Kullback-Leibler divergence based cost function is proposed to estimate the transform parameters. We conducted experiments on the REVERB Challenge 2014 task, where clean and multi-condition trained acoustic models are tested with real reverberant and noisy speech. We found that temporal information is important for reverberant speech recognition and the simultaneous use of spectral and temporal information for feature adaptation is effective. We also investigate the combination of the cross transform with fMLLR, the combination of batch, utterance and speaker mode adaptation, and multicondition adaptive training using proposed transforms. All experiments consistently report significant word error rate reductions. Accepted version 2016-12-21T06:12:18Z 2019-12-06T15:49:04Z 2016-12-21T06:12:18Z 2019-12-06T15:49:04Z 2016 Journal Article Nguyen, D. H. H., Xiao, X., Chng, E. S., & Li, H. (2016). Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(6), 1006-1019. 2329-9290 https://hdl.handle.net/10356/84664 http://hdl.handle.net/10220/41916 10.1109/TASLP.2016.2522646 en IEEE/ACM Transactions on Audio, Speech, and Language Processing © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [http://dx.doi.org/10.1109/TASLP.2016.2522646]. 15 p. application/pdf
spellingShingle	Feature adaptation Temporal filtering Nguyen, Duc Hoang Ha Xiao, Xiong Chng, Eng Siong Li, Haizhou Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition
title	Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition
title_full	Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition
title_fullStr	Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition
title_full_unstemmed	Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition
title_short	Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition
title_sort	feature adaptation using linear spectro temporal transform for robust speech recognition
topic	Feature adaptation Temporal filtering
url	https://hdl.handle.net/10356/84664 http://hdl.handle.net/10220/41916
work_keys_str_mv	AT nguyenduchoangha featureadaptationusinglinearspectrotemporaltransformforrobustspeechrecognition AT xiaoxiong featureadaptationusinglinearspectrotemporaltransformforrobustspeechrecognition AT chngengsiong featureadaptationusinglinearspectrotemporaltransformforrobustspeechrecognition AT lihaizhou featureadaptationusinglinearspectrotemporaltransformforrobustspeechrecognition

Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition

Similar Items