Speaker Verification Under Degraded Conditions Using Empirical Mode Decomposition Based Voice Activity Detection Algorithm

The performance of most of the state-of-the-art speaker recognition (SR) systems deteriorates under degraded conditions, owing to mismatch between the training and testing sessions. This study focuses on the front end of the speaker verification (SV) system to reduce the mismatch between training an...

Full description

Bibliographic Details
Main Authors:	Rudramurthy M. S., Prasad V. Kamakshi, Kumaraswamy R.
Format:	Article
Language:	English
Published:	De Gruyter 2014-12-01
Series:	Journal of Intelligent Systems
Subjects:	voice activity detection (vad) zero-frequency filter assisted peaking resonator (zffpr) empirical mode decomposition (emd) speaker verification (sv) gaussian mixture model–universal background model (gmm-ubm)
Online Access:	https://doi.org/10.1515/jisys-2013-0085

_version_	1819020182246391808
author	Rudramurthy M. S. Prasad V. Kamakshi Kumaraswamy R.
author_facet	Rudramurthy M. S. Prasad V. Kamakshi Kumaraswamy R.
author_sort	Rudramurthy M. S.
collection	DOAJ
description	The performance of most of the state-of-the-art speaker recognition (SR) systems deteriorates under degraded conditions, owing to mismatch between the training and testing sessions. This study focuses on the front end of the speaker verification (SV) system to reduce the mismatch between training and testing. An adaptive voice activity detection (VAD) algorithm using zero-frequency filter assisted peaking resonator (ZFFPR) was integrated into the front end of the SV system. The performance of this proposed SV system was studied under degraded conditions with 50 selected speakers from the NIST 2003 database. The degraded condition was simulated by adding different types of noises to the original speech utterances. The different types of noises were chosen from the NOISEX-92 database to simulate degraded conditions at signal-to-noise ratio levels from 0 to 20 dB. In this study, widely used 39-dimension Mel frequency cepstral coefficient (MFCC; i.e., 13-dimension MFCCs augmented with 13-dimension velocity and 13-dimension acceleration coefficients) features were used, and Gaussian mixture model–universal background model was used for speaker modeling. The proposed system’s performance was studied against the energy-based VAD used as the front end of the SV system. The proposed SV system showed some encouraging results when EMD-based VAD was used at its front end.
first_indexed	2024-12-21T03:47:08Z
format	Article
id	doaj.art-373a6d7adba44e3ba4673c4c34ab2183
institution	Directory Open Access Journal
issn	0334-1860 2191-026X
language	English
last_indexed	2024-12-21T03:47:08Z
publishDate	2014-12-01
publisher	De Gruyter
record_format	Article
series	Journal of Intelligent Systems
spelling	doaj.art-373a6d7adba44e3ba4673c4c34ab21832022-12-21T19:17:03ZengDe GruyterJournal of Intelligent Systems0334-18602191-026X2014-12-0123435937810.1515/jisys-2013-0085Speaker Verification Under Degraded Conditions Using Empirical Mode Decomposition Based Voice Activity Detection AlgorithmRudramurthy M. S.0Prasad V. Kamakshi1Kumaraswamy R.2Department of Information Science and Engineering, S.I.T., Tumkur 572 103, Karnataka State, IndiaDepartment of Computer Science, JNTUH, Kukatpally, Hyderabad 500 085, A.P. State, IndiaDepartment of Electronics and Communication Engineering, S.I.T., Tumkur 572 103, Karnataka State, IndiaThe performance of most of the state-of-the-art speaker recognition (SR) systems deteriorates under degraded conditions, owing to mismatch between the training and testing sessions. This study focuses on the front end of the speaker verification (SV) system to reduce the mismatch between training and testing. An adaptive voice activity detection (VAD) algorithm using zero-frequency filter assisted peaking resonator (ZFFPR) was integrated into the front end of the SV system. The performance of this proposed SV system was studied under degraded conditions with 50 selected speakers from the NIST 2003 database. The degraded condition was simulated by adding different types of noises to the original speech utterances. The different types of noises were chosen from the NOISEX-92 database to simulate degraded conditions at signal-to-noise ratio levels from 0 to 20 dB. In this study, widely used 39-dimension Mel frequency cepstral coefficient (MFCC; i.e., 13-dimension MFCCs augmented with 13-dimension velocity and 13-dimension acceleration coefficients) features were used, and Gaussian mixture model–universal background model was used for speaker modeling. The proposed system’s performance was studied against the energy-based VAD used as the front end of the SV system. The proposed SV system showed some encouraging results when EMD-based VAD was used at its front end.https://doi.org/10.1515/jisys-2013-0085voice activity detection (vad)zero-frequency filter assisted peaking resonator (zffpr)empirical mode decomposition (emd)speaker verification (sv)gaussian mixture model–universal background model (gmm-ubm)
spellingShingle	Rudramurthy M. S. Prasad V. Kamakshi Kumaraswamy R. Speaker Verification Under Degraded Conditions Using Empirical Mode Decomposition Based Voice Activity Detection Algorithm Journal of Intelligent Systems voice activity detection (vad) zero-frequency filter assisted peaking resonator (zffpr) empirical mode decomposition (emd) speaker verification (sv) gaussian mixture model–universal background model (gmm-ubm)
title	Speaker Verification Under Degraded Conditions Using Empirical Mode Decomposition Based Voice Activity Detection Algorithm
title_full	Speaker Verification Under Degraded Conditions Using Empirical Mode Decomposition Based Voice Activity Detection Algorithm
title_fullStr	Speaker Verification Under Degraded Conditions Using Empirical Mode Decomposition Based Voice Activity Detection Algorithm
title_full_unstemmed	Speaker Verification Under Degraded Conditions Using Empirical Mode Decomposition Based Voice Activity Detection Algorithm
title_short	Speaker Verification Under Degraded Conditions Using Empirical Mode Decomposition Based Voice Activity Detection Algorithm
title_sort	speaker verification under degraded conditions using empirical mode decomposition based voice activity detection algorithm
topic	voice activity detection (vad) zero-frequency filter assisted peaking resonator (zffpr) empirical mode decomposition (emd) speaker verification (sv) gaussian mixture model–universal background model (gmm-ubm)
url	https://doi.org/10.1515/jisys-2013-0085
work_keys_str_mv	AT rudramurthyms speakerverificationunderdegradedconditionsusingempiricalmodedecompositionbasedvoiceactivitydetectionalgorithm AT prasadvkamakshi speakerverificationunderdegradedconditionsusingempiricalmodedecompositionbasedvoiceactivitydetectionalgorithm AT kumaraswamyr speakerverificationunderdegradedconditionsusingempiricalmodedecompositionbasedvoiceactivitydetectionalgorithm

Speaker Verification Under Degraded Conditions Using Empirical Mode Decomposition Based Voice Activity Detection Algorithm

Similar Items