A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions

Speaker recognition is an indispensable technology for biometrics. It distinguishes individuals based on their vocal patterns. In this paper, a joint confirmation method based on the Akaike Information Criterion (AIC) of reconstruction error (REE) and time complexity (AIC-Time joint confirmation met...

Full description

Bibliographic Details
Main Authors: Dongbo Liu, Liming Huang, Yu Fang, Weibo Wang
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10214013/
_version_ 1797243496705294336
author Dongbo Liu
Liming Huang
Yu Fang
Weibo Wang
author_facet Dongbo Liu
Liming Huang
Yu Fang
Weibo Wang
author_sort Dongbo Liu
collection DOAJ
description Speaker recognition is an indispensable technology for biometrics. It distinguishes individuals based on their vocal patterns. In this paper, a joint confirmation method based on the Akaike Information Criterion (AIC) of reconstruction error (REE) and time complexity (AIC-Time joint confirmation method) is proposed to select the optimal decomposition rank of NMF. Furthermore, non-negative matrix factorization (NMF) is applied to the spectrogram to generate speaker features. The network for speaker recognition is based on Convolutional Neural Networks combining Squeeze Excitation (SE) blocks with ResNeXt, and the best combination is explored experimentally. The SE block conducts a channel-level adaptive adjustment of the feature maps, reducing redundancy and noise interference while improving feature extraction efficiency and accuracy. The ResNeXt convolutional neural network concurrently executes multiple convolutional kernels, acquiring richer feature information. The experimental results demonstrate that compared to speaker recognition based on Gaussian mixture models (GMM), Visual Geometry Group Network (VGGNet), ResNet, and SE-ResNeXt using spectrograms, this method increases the accuracy by an average of 5.8% and 16.24% under the overlaid of babble and factory1 noise with different signal-to-noise ratios, respectively. In the short speech test, the test set is short speech of 1s and 2s, and the noise is superimposed. Compared with other methods, the recognition rate is increased by an average of 8.67% and 11.72%, respectively.
first_indexed 2024-04-24T18:56:03Z
format Article
id doaj.art-441b02a63b9a41af8963a9bf5ac4d937
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-24T18:56:03Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-441b02a63b9a41af8963a9bf5ac4d9372024-03-26T17:34:49ZengIEEEIEEE Access2169-35362023-01-0111845008451310.1109/ACCESS.2023.330348510214013A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic ConditionsDongbo Liu0Liming Huang1https://orcid.org/0009-0003-3953-2610Yu Fang2https://orcid.org/0000-0002-0262-3872Weibo Wang3School of Electrical and Electronic Information, Xihua University, Chengdu, ChinaSchool of Electrical and Electronic Information, Xihua University, Chengdu, ChinaSchool of Electrical and Electronic Information, Xihua University, Chengdu, ChinaSchool of Electrical and Electronic Information, Xihua University, Chengdu, ChinaSpeaker recognition is an indispensable technology for biometrics. It distinguishes individuals based on their vocal patterns. In this paper, a joint confirmation method based on the Akaike Information Criterion (AIC) of reconstruction error (REE) and time complexity (AIC-Time joint confirmation method) is proposed to select the optimal decomposition rank of NMF. Furthermore, non-negative matrix factorization (NMF) is applied to the spectrogram to generate speaker features. The network for speaker recognition is based on Convolutional Neural Networks combining Squeeze Excitation (SE) blocks with ResNeXt, and the best combination is explored experimentally. The SE block conducts a channel-level adaptive adjustment of the feature maps, reducing redundancy and noise interference while improving feature extraction efficiency and accuracy. The ResNeXt convolutional neural network concurrently executes multiple convolutional kernels, acquiring richer feature information. The experimental results demonstrate that compared to speaker recognition based on Gaussian mixture models (GMM), Visual Geometry Group Network (VGGNet), ResNet, and SE-ResNeXt using spectrograms, this method increases the accuracy by an average of 5.8% and 16.24% under the overlaid of babble and factory1 noise with different signal-to-noise ratios, respectively. In the short speech test, the test set is short speech of 1s and 2s, and the noise is superimposed. Compared with other methods, the recognition rate is increased by an average of 8.67% and 11.72%, respectively.https://ieeexplore.ieee.org/document/10214013/Speaker recognitionnon-negative matrix factorizationResNeXtsqueeze-excitationakaike information criterion
spellingShingle Dongbo Liu
Liming Huang
Yu Fang
Weibo Wang
A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions
IEEE Access
Speaker recognition
non-negative matrix factorization
ResNeXt
squeeze-excitation
akaike information criterion
title A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions
title_full A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions
title_fullStr A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions
title_full_unstemmed A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions
title_short A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions
title_sort method for speaker recognition based on the resnext network under challenging acoustic conditions
topic Speaker recognition
non-negative matrix factorization
ResNeXt
squeeze-excitation
akaike information criterion
url https://ieeexplore.ieee.org/document/10214013/
work_keys_str_mv AT dongboliu amethodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions
AT liminghuang amethodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions
AT yufang amethodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions
AT weibowang amethodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions
AT dongboliu methodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions
AT liminghuang methodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions
AT yufang methodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions
AT weibowang methodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions