Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition

The performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, the performance gets degraded with the environmental noises, channel variation, physical and behavioral changes in speaker. The types of Speaker related feature play crucial role in impro...

Full description

Bibliographic Details
Main Authors:	Wondimu Lambamo, Ramasamy Srinivasagan, Worku Jifara
Format:	Article
Language:	English
Published:	MDPI AG 2022-12-01
Series:	Applied Sciences
Subjects:	speaker identification speaker verification Mel Spectrogram Cochleogram 2DCNN ResNet-50
Online Access:	https://www.mdpi.com/2076-3417/13/1/569

_version_	1797626165709504512
author	Wondimu Lambamo Ramasamy Srinivasagan Worku Jifara
author_facet	Wondimu Lambamo Ramasamy Srinivasagan Worku Jifara
author_sort	Wondimu Lambamo
collection	DOAJ
description	The performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, the performance gets degraded with the environmental noises, channel variation, physical and behavioral changes in speaker. The types of Speaker related feature play crucial role in improving the performance of speaker recognition systems. Gammatone Frequency Cepstral Coefficient (GFCC) features has been widely used to develop robust speaker recognition systems with the conventional machine learning, it achieved better performance compared to Mel Frequency Cepstral Coefficient (MFCC) features in the noisy condition. Recently, deep learning models showed better performance in the speaker recognition compared to conventional machine learning. Most of the previous deep learning-based speaker recognition models has used Mel Spectrogram and similar inputs rather than a handcrafted features like MFCC and GFCC features. However, the performance of the Mel Spectrogram features gets degraded in the high noise ratio and mismatch in the utterances. Similar to Mel Spectrogram, Cochleogram is another important feature for deep learning speaker recognition models. Like GFCC features, Cochleogram represents utterances in Equal Rectangular Band (ERB) scale which is important in noisy condition. However, none of the studies have conducted analysis for noise robustness of Cochleogram and Mel Spectrogram in speaker recognition. In addition, only limited studies have used Cochleogram to develop speech-based models in noisy and mismatch condition using deep learning. In this study, analysis of noise robustness of Cochleogram and Mel Spectrogram features in speaker recognition using deep learning model is conducted at the Signal to Noise Ratio (SNR) level from −5 dB to 20 dB. Experiments are conducted on the VoxCeleb1 and Noise added VoxCeleb1 dataset by using basic 2DCNN, ResNet-50, VGG-16, ECAPA-TDNN and TitaNet Models architectures. The Speaker identification and verification performance of both Cochleogram and Mel Spectrogram is evaluated. The results show that Cochleogram have better performance than Mel Spectrogram in both speaker identification and verification at the noisy and mismatch condition.
first_indexed	2024-03-11T10:06:37Z
format	Article
id	doaj.art-29fe48e18d3c4f578be1a2a59128cc61
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-11T10:06:37Z
publishDate	2022-12-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-29fe48e18d3c4f578be1a2a59128cc612023-11-16T14:58:48ZengMDPI AGApplied Sciences2076-34172022-12-0113156910.3390/app13010569Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker RecognitionWondimu Lambamo0Ramasamy Srinivasagan1Worku Jifara2Computer Science and Engineering Department, Adama Science and Technology University, Adama P.O. Box 1888, EthiopiaComputer Engineering, King Faisal University, Al Hofuf 31982, Al-Ahsa, Saudi ArabiaComputer Science and Engineering Department, Adama Science and Technology University, Adama P.O. Box 1888, EthiopiaThe performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, the performance gets degraded with the environmental noises, channel variation, physical and behavioral changes in speaker. The types of Speaker related feature play crucial role in improving the performance of speaker recognition systems. Gammatone Frequency Cepstral Coefficient (GFCC) features has been widely used to develop robust speaker recognition systems with the conventional machine learning, it achieved better performance compared to Mel Frequency Cepstral Coefficient (MFCC) features in the noisy condition. Recently, deep learning models showed better performance in the speaker recognition compared to conventional machine learning. Most of the previous deep learning-based speaker recognition models has used Mel Spectrogram and similar inputs rather than a handcrafted features like MFCC and GFCC features. However, the performance of the Mel Spectrogram features gets degraded in the high noise ratio and mismatch in the utterances. Similar to Mel Spectrogram, Cochleogram is another important feature for deep learning speaker recognition models. Like GFCC features, Cochleogram represents utterances in Equal Rectangular Band (ERB) scale which is important in noisy condition. However, none of the studies have conducted analysis for noise robustness of Cochleogram and Mel Spectrogram in speaker recognition. In addition, only limited studies have used Cochleogram to develop speech-based models in noisy and mismatch condition using deep learning. In this study, analysis of noise robustness of Cochleogram and Mel Spectrogram features in speaker recognition using deep learning model is conducted at the Signal to Noise Ratio (SNR) level from −5 dB to 20 dB. Experiments are conducted on the VoxCeleb1 and Noise added VoxCeleb1 dataset by using basic 2DCNN, ResNet-50, VGG-16, ECAPA-TDNN and TitaNet Models architectures. The Speaker identification and verification performance of both Cochleogram and Mel Spectrogram is evaluated. The results show that Cochleogram have better performance than Mel Spectrogram in both speaker identification and verification at the noisy and mismatch condition.https://www.mdpi.com/2076-3417/13/1/569speaker identificationspeaker verificationMel SpectrogramCochleogram2DCNNResNet-50
spellingShingle	Wondimu Lambamo Ramasamy Srinivasagan Worku Jifara Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition Applied Sciences speaker identification speaker verification Mel Spectrogram Cochleogram 2DCNN ResNet-50
title	Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition
title_full	Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition
title_fullStr	Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition
title_full_unstemmed	Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition
title_short	Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition
title_sort	analyzing noise robustness of cochleogram and mel spectrogram features in deep learning based speaker recognition
topic	speaker identification speaker verification Mel Spectrogram Cochleogram 2DCNN ResNet-50
url	https://www.mdpi.com/2076-3417/13/1/569
work_keys_str_mv	AT wondimulambamo analyzingnoiserobustnessofcochleogramandmelspectrogramfeaturesindeeplearningbasedspeakerrecognition AT ramasamysrinivasagan analyzingnoiserobustnessofcochleogramandmelspectrogramfeaturesindeeplearningbasedspeakerrecognition AT workujifara analyzingnoiserobustnessofcochleogramandmelspectrogramfeaturesindeeplearningbasedspeakerrecognition

Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition

Similar Items