A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions
Speaker recognition is an indispensable technology for biometrics. It distinguishes individuals based on their vocal patterns. In this paper, a joint confirmation method based on the Akaike Information Criterion (AIC) of reconstruction error (REE) and time complexity (AIC-Time joint confirmation met...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10214013/ |
_version_ | 1797243496705294336 |
---|---|
author | Dongbo Liu Liming Huang Yu Fang Weibo Wang |
author_facet | Dongbo Liu Liming Huang Yu Fang Weibo Wang |
author_sort | Dongbo Liu |
collection | DOAJ |
description | Speaker recognition is an indispensable technology for biometrics. It distinguishes individuals based on their vocal patterns. In this paper, a joint confirmation method based on the Akaike Information Criterion (AIC) of reconstruction error (REE) and time complexity (AIC-Time joint confirmation method) is proposed to select the optimal decomposition rank of NMF. Furthermore, non-negative matrix factorization (NMF) is applied to the spectrogram to generate speaker features. The network for speaker recognition is based on Convolutional Neural Networks combining Squeeze Excitation (SE) blocks with ResNeXt, and the best combination is explored experimentally. The SE block conducts a channel-level adaptive adjustment of the feature maps, reducing redundancy and noise interference while improving feature extraction efficiency and accuracy. The ResNeXt convolutional neural network concurrently executes multiple convolutional kernels, acquiring richer feature information. The experimental results demonstrate that compared to speaker recognition based on Gaussian mixture models (GMM), Visual Geometry Group Network (VGGNet), ResNet, and SE-ResNeXt using spectrograms, this method increases the accuracy by an average of 5.8% and 16.24% under the overlaid of babble and factory1 noise with different signal-to-noise ratios, respectively. In the short speech test, the test set is short speech of 1s and 2s, and the noise is superimposed. Compared with other methods, the recognition rate is increased by an average of 8.67% and 11.72%, respectively. |
first_indexed | 2024-04-24T18:56:03Z |
format | Article |
id | doaj.art-441b02a63b9a41af8963a9bf5ac4d937 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-24T18:56:03Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-441b02a63b9a41af8963a9bf5ac4d9372024-03-26T17:34:49ZengIEEEIEEE Access2169-35362023-01-0111845008451310.1109/ACCESS.2023.330348510214013A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic ConditionsDongbo Liu0Liming Huang1https://orcid.org/0009-0003-3953-2610Yu Fang2https://orcid.org/0000-0002-0262-3872Weibo Wang3School of Electrical and Electronic Information, Xihua University, Chengdu, ChinaSchool of Electrical and Electronic Information, Xihua University, Chengdu, ChinaSchool of Electrical and Electronic Information, Xihua University, Chengdu, ChinaSchool of Electrical and Electronic Information, Xihua University, Chengdu, ChinaSpeaker recognition is an indispensable technology for biometrics. It distinguishes individuals based on their vocal patterns. In this paper, a joint confirmation method based on the Akaike Information Criterion (AIC) of reconstruction error (REE) and time complexity (AIC-Time joint confirmation method) is proposed to select the optimal decomposition rank of NMF. Furthermore, non-negative matrix factorization (NMF) is applied to the spectrogram to generate speaker features. The network for speaker recognition is based on Convolutional Neural Networks combining Squeeze Excitation (SE) blocks with ResNeXt, and the best combination is explored experimentally. The SE block conducts a channel-level adaptive adjustment of the feature maps, reducing redundancy and noise interference while improving feature extraction efficiency and accuracy. The ResNeXt convolutional neural network concurrently executes multiple convolutional kernels, acquiring richer feature information. The experimental results demonstrate that compared to speaker recognition based on Gaussian mixture models (GMM), Visual Geometry Group Network (VGGNet), ResNet, and SE-ResNeXt using spectrograms, this method increases the accuracy by an average of 5.8% and 16.24% under the overlaid of babble and factory1 noise with different signal-to-noise ratios, respectively. In the short speech test, the test set is short speech of 1s and 2s, and the noise is superimposed. Compared with other methods, the recognition rate is increased by an average of 8.67% and 11.72%, respectively.https://ieeexplore.ieee.org/document/10214013/Speaker recognitionnon-negative matrix factorizationResNeXtsqueeze-excitationakaike information criterion |
spellingShingle | Dongbo Liu Liming Huang Yu Fang Weibo Wang A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions IEEE Access Speaker recognition non-negative matrix factorization ResNeXt squeeze-excitation akaike information criterion |
title | A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions |
title_full | A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions |
title_fullStr | A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions |
title_full_unstemmed | A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions |
title_short | A Method for Speaker Recognition Based on the ResNeXt Network Under Challenging Acoustic Conditions |
title_sort | method for speaker recognition based on the resnext network under challenging acoustic conditions |
topic | Speaker recognition non-negative matrix factorization ResNeXt squeeze-excitation akaike information criterion |
url | https://ieeexplore.ieee.org/document/10214013/ |
work_keys_str_mv | AT dongboliu amethodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions AT liminghuang amethodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions AT yufang amethodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions AT weibowang amethodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions AT dongboliu methodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions AT liminghuang methodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions AT yufang methodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions AT weibowang methodforspeakerrecognitionbasedontheresnextnetworkunderchallengingacousticconditions |