A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions

Speaker recognition is an indispensable technology for biometrics. It distinguishes individuals based on their vocal patterns. In this paper, a joint confirmation method based on the Akaike Information Criterion (AIC) of reconstruction error (REE) and time complexity (AIC-Time joint confirmation met...

Full description

Bibliographic Details
Main Authors:	Dongbo Liu, Liming Huang, Yu Fang, Weibo Wang
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Speaker recognition non-negative matrix factorization ResNeXt squeeze-excitation akaike information criterion
Online Access:	https://ieeexplore.ieee.org/document/10214013/

_version_	1827308301342212096
author	Dongbo Liu Liming Huang Yu Fang Weibo Wang
author_facet	Dongbo Liu Liming Huang Yu Fang Weibo Wang
author_sort	Dongbo Liu
collection	DOAJ
description	Speaker recognition is an indispensable technology for biometrics. It distinguishes individuals based on their vocal patterns. In this paper, a joint confirmation method based on the Akaike Information Criterion (AIC) of reconstruction error (REE) and time complexity (AIC-Time joint confirmation method) is proposed to select the optimal decomposition rank of NMF. Furthermore, non-negative matrix factorization (NMF) is applied to the spectrogram to generate speaker features. The network for speaker recognition is based on Convolutional Neural Networks combining Squeeze Excitation (SE) blocks with ResNeXt, and the best combination is explored experimentally. The SE block conducts a channel-level adaptive adjustment of the feature maps, reducing redundancy and noise interference while improving feature extraction efficiency and accuracy. The ResNeXt convolutional neural network concurrently executes multiple convolutional kernels, acquiring richer feature information. The experimental results demonstrate that compared to speaker recognition based on Gaussian mixture models (GMM), Visual Geometry Group Network (VGGNet), ResNet, and SE-ResNeXt using spectrograms, this method increases the accuracy by an average of 5.8% and 16.24% under the overlaid of babble and factory1 noise with different signal-to-noise ratios, respectively. In the short speech test, the test set is short speech of 1s and 2s, and the noise is superimposed. Compared with other methods, the recognition rate is increased by an average of 8.67% and 11.72%, respectively.
first_indexed	2024-04-24T18:56:03Z
format	Article
id	doaj.art-441b02a63b9a41af8963a9bf5ac4d937
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-24T18:56:03Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-441b02a63b9a41af8963a9bf5ac4d9372024-03-26T17:34:49ZengIEEEIEEE Access2169-35362023-01-0111845008451310.1109/ACCESS.2023.330348510214013A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic ConditionsDongbo Liu0Liming Huang1https://orcid.org/0009-0003-3953-2610Yu Fang2https://orcid.org/0000-0002-0262-3872Weibo Wang3School of Electrical and Electronic Information, Xihua University, Chengdu, ChinaSchool of Electrical and Electronic Information, Xihua University, Chengdu, ChinaSchool of Electrical and Electronic Information, Xihua University, Chengdu, ChinaSchool of Electrical and Electronic Information, Xihua University, Chengdu, ChinaSpeaker recognition is an indispensable technology for biometrics. It distinguishes individuals based on their vocal patterns. In this paper, a joint confirmation method based on the Akaike Information Criterion (AIC) of reconstruction error (REE) and time complexity (AIC-Time joint confirmation method) is proposed to select the optimal decomposition rank of NMF. Furthermore, non-negative matrix factorization (NMF) is applied to the spectrogram to generate speaker features. The network for speaker recognition is based on Convolutional Neural Networks combining Squeeze Excitation (SE) blocks with ResNeXt, and the best combination is explored experimentally. The SE block conducts a channel-level adaptive adjustment of the feature maps, reducing redundancy and noise interference while improving feature extraction efficiency and accuracy. The ResNeXt convolutional neural network concurrently executes multiple convolutional kernels, acquiring richer feature information. The experimental results demonstrate that compared to speaker recognition based on Gaussian mixture models (GMM), Visual Geometry Group Network (VGGNet), ResNet, and SE-ResNeXt using spectrograms, this method increases the accuracy by an average of 5.8% and 16.24% under the overlaid of babble and factory1 noise with different signal-to-noise ratios, respectively. In the short speech test, the test set is short speech of 1s and 2s, and the noise is superimposed. Compared with other methods, the recognition rate is increased by an average of 8.67% and 11.72%, respectively.https://ieeexplore.ieee.org/document/10214013/Speaker recognitionnon-negative matrix factorizationResNeXtsqueeze-excitationakaike information criterion
spellingShingle	Dongbo Liu Liming Huang Yu Fang Weibo Wang A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions IEEE Access Speaker recognition non-negative matrix factorization ResNeXt squeeze-excitation akaike information criterion
title	A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions
title_full	A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions
title_fullStr	A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions
title_full_unstemmed	A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions
title_short	A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions
title_sort	method for speaker recognition based on the resnext network under challenging acoustic conditions
topic	Speaker recognition non-negative matrix factorization ResNeXt squeeze-excitation akaike information criterion
url	https://ieeexplore.ieee.org/document/10214013/
work_keys_str_mv	AT dongboliu amethodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions AT liminghuang amethodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions AT yufang amethodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions AT weibowang amethodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions AT dongboliu methodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions AT liminghuang methodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions AT yufang methodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions AT weibowang methodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions

A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions

Similar Items