Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios

Speech processing algorithms, especially sound source localization (SSL), speech enhancement, and speaker tracking are considered to be the main fields in this application. Most speech processing algorithms require knowing the number of speakers for real implementation. In this article, a novel meth...

Full description

Bibliographic Details
Main Authors:	Ali Dehghan Firoozabadi, Pablo Adasme, David Zabala-Blanco, Pablo Palacios Játiva, Cesar Azurdia-Meza
Format:	Article
Language:	English
Published:	MDPI AG 2023-05-01
Series:	Sensors
Subjects:	speech processing speaker counting source localization adaptive processing microphone arrays classification
Online Access:	https://www.mdpi.com/1424-8220/23/9/4499

_version_	1797601645754843136
author	Ali Dehghan Firoozabadi Pablo Adasme David Zabala-Blanco Pablo Palacios Játiva Cesar Azurdia-Meza
author_facet	Ali Dehghan Firoozabadi Pablo Adasme David Zabala-Blanco Pablo Palacios Játiva Cesar Azurdia-Meza
author_sort	Ali Dehghan Firoozabadi
collection	DOAJ
description	Speech processing algorithms, especially sound source localization (SSL), speech enhancement, and speaker tracking are considered to be the main fields in this application. Most speech processing algorithms require knowing the number of speakers for real implementation. In this article, a novel method for estimating the number of speakers is proposed based on the hive shaped nested microphone array (HNMA) by wavelet packet transform (WPT) and 2D sub-band adaptive steered response power (SB-2DASRP) with phase transform (PHAT) and maximum likelihood (ML) filters, and, finally, the agglomerative classification and elbow criteria for obtaining the number of speakers in near-field scenarios. The proposed HNMA is presented for aliasing and imaging elimination and preparing the proper signals for the speaker counting method. In the following, the Blackman–Tukey spectral estimation method is selected for detecting the proper frequency components of the recorded signal. The WPT is considered for smart sub-band processing by focusing on the frequency bins of the speech signal. In addition, the SRP method is implemented in 2D format and adaptively by ML and PHAT filters on the sub-band signals. The SB-2DASRP peak positions are extracted on various time frames based on the standard deviation (SD) criteria, and the final number of speakers is estimated by unsupervised agglomerative clustering and elbow criteria. The proposed HNMA-SB-2DASRP method is compared with the frequency-domain magnitude squared coherence (FD-MSC), i-vector probabilistic linear discriminant analysis (i-vector PLDA), ambisonics features of the correlational recurrent neural network (AF-CRNN), and speaker counting by density-based classification and clustering decision (SC-DCCD) algorithms on noisy and reverberant environments, which represents the superiority of the proposed method for real implementation.
first_indexed	2024-03-11T04:06:33Z
format	Article
id	doaj.art-1353841292b045e284723c7ab9431034
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-11T04:06:33Z
publishDate	2023-05-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-1353841292b045e284723c7ab94310342023-11-17T23:45:15ZengMDPI AGSensors1424-82202023-05-01239449910.3390/s23094499Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field ScenariosAli Dehghan Firoozabadi0Pablo Adasme1David Zabala-Blanco2Pablo Palacios Játiva3Cesar Azurdia-Meza4Department of Electricity, Universidad Tecnológica Metropolitana, Av. José Pedro Alessandri 1242, Santiago 7800002, ChileElectrical Engineering Department, Universidad de Santiago de Chile, Av. Victor Jara 3519, Santiago 9170124, ChileDepartment of Computing and Industries, Universidad Católica del Maule, Talca 3466706, ChileDepartment of Electrical Engineering, Universidad de Chile, Santiago 8370451, ChileDepartment of Electrical Engineering, Universidad de Chile, Santiago 8370451, ChileSpeech processing algorithms, especially sound source localization (SSL), speech enhancement, and speaker tracking are considered to be the main fields in this application. Most speech processing algorithms require knowing the number of speakers for real implementation. In this article, a novel method for estimating the number of speakers is proposed based on the hive shaped nested microphone array (HNMA) by wavelet packet transform (WPT) and 2D sub-band adaptive steered response power (SB-2DASRP) with phase transform (PHAT) and maximum likelihood (ML) filters, and, finally, the agglomerative classification and elbow criteria for obtaining the number of speakers in near-field scenarios. The proposed HNMA is presented for aliasing and imaging elimination and preparing the proper signals for the speaker counting method. In the following, the Blackman–Tukey spectral estimation method is selected for detecting the proper frequency components of the recorded signal. The WPT is considered for smart sub-band processing by focusing on the frequency bins of the speech signal. In addition, the SRP method is implemented in 2D format and adaptively by ML and PHAT filters on the sub-band signals. The SB-2DASRP peak positions are extracted on various time frames based on the standard deviation (SD) criteria, and the final number of speakers is estimated by unsupervised agglomerative clustering and elbow criteria. The proposed HNMA-SB-2DASRP method is compared with the frequency-domain magnitude squared coherence (FD-MSC), i-vector probabilistic linear discriminant analysis (i-vector PLDA), ambisonics features of the correlational recurrent neural network (AF-CRNN), and speaker counting by density-based classification and clustering decision (SC-DCCD) algorithms on noisy and reverberant environments, which represents the superiority of the proposed method for real implementation.https://www.mdpi.com/1424-8220/23/9/4499speech processingspeaker countingsource localizationadaptive processingmicrophone arraysclassification
spellingShingle	Ali Dehghan Firoozabadi Pablo Adasme David Zabala-Blanco Pablo Palacios Játiva Cesar Azurdia-Meza Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios Sensors speech processing speaker counting source localization adaptive processing microphone arrays classification
title	Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios
title_full	Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios
title_fullStr	Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios
title_full_unstemmed	Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios
title_short	Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios
title_sort	speaker counting based on a novel hive shaped nested microphone array by wpt and 2d adaptive srp algorithms in near field scenarios
topic	speech processing speaker counting source localization adaptive processing microphone arrays classification
url	https://www.mdpi.com/1424-8220/23/9/4499
work_keys_str_mv	AT alidehghanfiroozabadi speakercountingbasedonanovelhiveshapednestedmicrophonearraybywptand2dadaptivesrpalgorithmsinnearfieldscenarios AT pabloadasme speakercountingbasedonanovelhiveshapednestedmicrophonearraybywptand2dadaptivesrpalgorithmsinnearfieldscenarios AT davidzabalablanco speakercountingbasedonanovelhiveshapednestedmicrophonearraybywptand2dadaptivesrpalgorithmsinnearfieldscenarios AT pablopalaciosjativa speakercountingbasedonanovelhiveshapednestedmicrophonearraybywptand2dadaptivesrpalgorithmsinnearfieldscenarios AT cesarazurdiameza speakercountingbasedonanovelhiveshapednestedmicrophonearraybywptand2dadaptivesrpalgorithmsinnearfieldscenarios

Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios

Similar Items