Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition

The performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, the performance gets degraded with the environmental noises, channel variation, physical and behavioral changes in speaker. The types of Speaker related feature play crucial role in impro...

Full description

Bibliographic Details
Main Authors: Wondimu Lambamo, Ramasamy Srinivasagan, Worku Jifara
Format: Article
Language:English
Published: MDPI AG 2022-12-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/1/569
_version_ 1797626165709504512
author Wondimu Lambamo
Ramasamy Srinivasagan
Worku Jifara
author_facet Wondimu Lambamo
Ramasamy Srinivasagan
Worku Jifara
author_sort Wondimu Lambamo
collection DOAJ
description The performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, the performance gets degraded with the environmental noises, channel variation, physical and behavioral changes in speaker. The types of Speaker related feature play crucial role in improving the performance of speaker recognition systems. Gammatone Frequency Cepstral Coefficient (GFCC) features has been widely used to develop robust speaker recognition systems with the conventional machine learning, it achieved better performance compared to Mel Frequency Cepstral Coefficient (MFCC) features in the noisy condition. Recently, deep learning models showed better performance in the speaker recognition compared to conventional machine learning. Most of the previous deep learning-based speaker recognition models has used Mel Spectrogram and similar inputs rather than a handcrafted features like MFCC and GFCC features. However, the performance of the Mel Spectrogram features gets degraded in the high noise ratio and mismatch in the utterances. Similar to Mel Spectrogram, Cochleogram is another important feature for deep learning speaker recognition models. Like GFCC features, Cochleogram represents utterances in Equal Rectangular Band (ERB) scale which is important in noisy condition. However, none of the studies have conducted analysis for noise robustness of Cochleogram and Mel Spectrogram in speaker recognition. In addition, only limited studies have used Cochleogram to develop speech-based models in noisy and mismatch condition using deep learning. In this study, analysis of noise robustness of Cochleogram and Mel Spectrogram features in speaker recognition using deep learning model is conducted at the Signal to Noise Ratio (SNR) level from −5 dB to 20 dB. Experiments are conducted on the VoxCeleb1 and Noise added VoxCeleb1 dataset by using basic 2DCNN, ResNet-50, VGG-16, ECAPA-TDNN and TitaNet Models architectures. The Speaker identification and verification performance of both Cochleogram and Mel Spectrogram is evaluated. The results show that Cochleogram have better performance than Mel Spectrogram in both speaker identification and verification at the noisy and mismatch condition.
first_indexed 2024-03-11T10:06:37Z
format Article
id doaj.art-29fe48e18d3c4f578be1a2a59128cc61
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T10:06:37Z
publishDate 2022-12-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-29fe48e18d3c4f578be1a2a59128cc612023-11-16T14:58:48ZengMDPI AGApplied Sciences2076-34172022-12-0113156910.3390/app13010569Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker RecognitionWondimu Lambamo0Ramasamy Srinivasagan1Worku Jifara2Computer Science and Engineering Department, Adama Science and Technology University, Adama P.O. Box 1888, EthiopiaComputer Engineering, King Faisal University, Al Hofuf 31982, Al-Ahsa, Saudi ArabiaComputer Science and Engineering Department, Adama Science and Technology University, Adama P.O. Box 1888, EthiopiaThe performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, the performance gets degraded with the environmental noises, channel variation, physical and behavioral changes in speaker. The types of Speaker related feature play crucial role in improving the performance of speaker recognition systems. Gammatone Frequency Cepstral Coefficient (GFCC) features has been widely used to develop robust speaker recognition systems with the conventional machine learning, it achieved better performance compared to Mel Frequency Cepstral Coefficient (MFCC) features in the noisy condition. Recently, deep learning models showed better performance in the speaker recognition compared to conventional machine learning. Most of the previous deep learning-based speaker recognition models has used Mel Spectrogram and similar inputs rather than a handcrafted features like MFCC and GFCC features. However, the performance of the Mel Spectrogram features gets degraded in the high noise ratio and mismatch in the utterances. Similar to Mel Spectrogram, Cochleogram is another important feature for deep learning speaker recognition models. Like GFCC features, Cochleogram represents utterances in Equal Rectangular Band (ERB) scale which is important in noisy condition. However, none of the studies have conducted analysis for noise robustness of Cochleogram and Mel Spectrogram in speaker recognition. In addition, only limited studies have used Cochleogram to develop speech-based models in noisy and mismatch condition using deep learning. In this study, analysis of noise robustness of Cochleogram and Mel Spectrogram features in speaker recognition using deep learning model is conducted at the Signal to Noise Ratio (SNR) level from −5 dB to 20 dB. Experiments are conducted on the VoxCeleb1 and Noise added VoxCeleb1 dataset by using basic 2DCNN, ResNet-50, VGG-16, ECAPA-TDNN and TitaNet Models architectures. The Speaker identification and verification performance of both Cochleogram and Mel Spectrogram is evaluated. The results show that Cochleogram have better performance than Mel Spectrogram in both speaker identification and verification at the noisy and mismatch condition.https://www.mdpi.com/2076-3417/13/1/569speaker identificationspeaker verificationMel SpectrogramCochleogram2DCNNResNet-50
spellingShingle Wondimu Lambamo
Ramasamy Srinivasagan
Worku Jifara
Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition
Applied Sciences
speaker identification
speaker verification
Mel Spectrogram
Cochleogram
2DCNN
ResNet-50
title Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition
title_full Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition
title_fullStr Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition
title_full_unstemmed Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition
title_short Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition
title_sort analyzing noise robustness of cochleogram and mel spectrogram features in deep learning based speaker recognition
topic speaker identification
speaker verification
Mel Spectrogram
Cochleogram
2DCNN
ResNet-50
url https://www.mdpi.com/2076-3417/13/1/569
work_keys_str_mv AT wondimulambamo analyzingnoiserobustnessofcochleogramandmelspectrogramfeaturesindeeplearningbasedspeakerrecognition
AT ramasamysrinivasagan analyzingnoiserobustnessofcochleogramandmelspectrogramfeaturesindeeplearningbasedspeakerrecognition
AT workujifara analyzingnoiserobustnessofcochleogramandmelspectrogramfeaturesindeeplearningbasedspeakerrecognition