Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech

In this paper, we propose a framework for joint normalization of spectral and temporal statistics of speech features for robust speech recognition. Current feature normalization approaches normalize the spectral and temporal aspects of feature statistics separately to overcome noise and reverberatio...

Full description

Bibliographic Details
Main Authors:	Xiao, Xiong, Chng, Eng Siong, Li, Haizhou
Other Authors:	School of Computer Engineering
Format:	Conference Paper
Language:	English
Published:	2013
Subjects:	DRNTU::Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/98409 http://hdl.handle.net/10220/13398

_version_	1811680697667026944
author	Xiao, Xiong Chng, Eng Siong Li, Haizhou
author2	School of Computer Engineering
author_facet	School of Computer Engineering Xiao, Xiong Chng, Eng Siong Li, Haizhou
author_sort	Xiao, Xiong
collection	NTU
description	In this paper, we propose a framework for joint normalization of spectral and temporal statistics of speech features for robust speech recognition. Current feature normalization approaches normalize the spectral and temporal aspects of feature statistics separately to overcome noise and reverberation. As a result, the interaction between the spectral normalization (e.g. mean and variance normalization, MVN) and temporal normalization (e.g. temporal structure normalization, TSN) is ignored. We propose a joint spectral and temporal normalization (JSTN) framework to simultaneously normalize these two aspects of feature statistics. In JSTN, feature trajectories are filtered by linear filters and the filters' coefficients are optimized by maximizing a likelihood-based objective function. Experimental results on Aurora-5 benchmark task show that JSTN consistently out-performs the cascade of MVN and TSN on test data corrupted by both additive noise and reverberation, which validates our proposal. Specifically, JSTN reduces average word error rate by 8-9% relatively over the cascade of MVN and TSN for both artificial and real noisy data.
first_indexed	2024-10-01T03:29:10Z
format	Conference Paper
id	ntu-10356/98409
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T03:29:10Z
publishDate	2013
record_format	dspace
spelling	ntu-10356/984092020-05-28T07:18:03Z Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech Xiao, Xiong Chng, Eng Siong Li, Haizhou School of Computer Engineering IEEE International Conference on Acoustics, Speech and Signal Processing (2012 : Kyoto, Japan) Temasek Laboratories DRNTU::Engineering::Computer science and engineering In this paper, we propose a framework for joint normalization of spectral and temporal statistics of speech features for robust speech recognition. Current feature normalization approaches normalize the spectral and temporal aspects of feature statistics separately to overcome noise and reverberation. As a result, the interaction between the spectral normalization (e.g. mean and variance normalization, MVN) and temporal normalization (e.g. temporal structure normalization, TSN) is ignored. We propose a joint spectral and temporal normalization (JSTN) framework to simultaneously normalize these two aspects of feature statistics. In JSTN, feature trajectories are filtered by linear filters and the filters' coefficients are optimized by maximizing a likelihood-based objective function. Experimental results on Aurora-5 benchmark task show that JSTN consistently out-performs the cascade of MVN and TSN on test data corrupted by both additive noise and reverberation, which validates our proposal. Specifically, JSTN reduces average word error rate by 8-9% relatively over the cascade of MVN and TSN for both artificial and real noisy data. 2013-09-09T06:59:14Z 2019-12-06T19:54:56Z 2013-09-09T06:59:14Z 2019-12-06T19:54:56Z 2012 2012 Conference Paper Xiao, X., Chng, E. S., & Li, H. (2012). Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech. 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4325-4328. https://hdl.handle.net/10356/98409 http://hdl.handle.net/10220/13398 10.1109/ICASSP.2012.6288876 en © 2012 IEEE.
spellingShingle	DRNTU::Engineering::Computer science and engineering Xiao, Xiong Chng, Eng Siong Li, Haizhou Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech
title	Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech
title_full	Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech
title_fullStr	Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech
title_full_unstemmed	Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech
title_short	Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech
title_sort	joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech
topic	DRNTU::Engineering::Computer science and engineering
url	https://hdl.handle.net/10356/98409 http://hdl.handle.net/10220/13398
work_keys_str_mv	AT xiaoxiong jointspectralandtemporalnormalizationoffeaturesforrobustrecognitionofnoisyandreverberatedspeech AT chngengsiong jointspectralandtemporalnormalizationoffeaturesforrobustrecognitionofnoisyandreverberatedspeech AT lihaizhou jointspectralandtemporalnormalizationoffeaturesforrobustrecognitionofnoisyandreverberatedspeech

Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech

Similar Items