Sound Source Localization in Spherical Harmonics Domain Based on High-Order Ambisonics Signals Enhancement Neural Network

Compared with methods based on microphone arrays, sound source localization methods based on higher-order ambisonics (HOA) signals are no longer limited to specific array structures and exhibit better performance in multi-source scenarios. However, the estimation errors of HOA signals always limit t...

Full description

Bibliographic Details
Main Authors: Lezhong Wang, Leru Wang
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10477500/
_version_ 1827303765611380736
author Lezhong Wang
Leru Wang
author_facet Lezhong Wang
Leru Wang
author_sort Lezhong Wang
collection DOAJ
description Compared with methods based on microphone arrays, sound source localization methods based on higher-order ambisonics (HOA) signals are no longer limited to specific array structures and exhibit better performance in multi-source scenarios. However, the estimation errors of HOA signals always limit the available frequency band of localization algorithms, leading to a decrease in the accuracy and robustness of localization algorithms. To address this problem, we propose a sound source localization method that combines an HOA signals enhancement neural network (referred to as the network model). This method uses a convolutional neural network (CNN) to eliminate low-frequency noise and high-frequency aliasing errors in the HOA signals. It enhances the noise resistance of the network model by adding noise interference during the training. Because the network model improves the consistency of spatial features in each frequency band of the HOA signals, we directly used the full-band frequency smoothing algorithm to improve the accuracy of the covariance matrix and combined it with the minimum variance distortionless response algorithm in the eigenbeam domain (i.e., spherical harmonics domain) (EB-MVDR) for sound source localization. Experimental results show that compared with the traditional EB-MVDR, the proposed sound source localization method can effectively improve the accuracy of multi-source localization in noisy and reverberant environments and has good performance under different numbers of sound sources.
first_indexed 2024-04-24T17:06:42Z
format Article
id doaj.art-e911bb7eedca4794ae603791ec0327e3
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-24T17:06:42Z
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-e911bb7eedca4794ae603791ec0327e32024-03-28T23:00:18ZengIEEEIEEE Access2169-35362024-01-0112440434405410.1109/ACCESS.2024.338035510477500Sound Source Localization in Spherical Harmonics Domain Based on High-Order Ambisonics Signals Enhancement Neural NetworkLezhong Wang0https://orcid.org/0009-0005-0555-789XLeru Wang1College of Underwater Acoustic Engineering, Harbin Engineering University, Harbin, ChinaSchool of Information Science and Technology, Hangzhou Normal University, Hangzhou, ChinaCompared with methods based on microphone arrays, sound source localization methods based on higher-order ambisonics (HOA) signals are no longer limited to specific array structures and exhibit better performance in multi-source scenarios. However, the estimation errors of HOA signals always limit the available frequency band of localization algorithms, leading to a decrease in the accuracy and robustness of localization algorithms. To address this problem, we propose a sound source localization method that combines an HOA signals enhancement neural network (referred to as the network model). This method uses a convolutional neural network (CNN) to eliminate low-frequency noise and high-frequency aliasing errors in the HOA signals. It enhances the noise resistance of the network model by adding noise interference during the training. Because the network model improves the consistency of spatial features in each frequency band of the HOA signals, we directly used the full-band frequency smoothing algorithm to improve the accuracy of the covariance matrix and combined it with the minimum variance distortionless response algorithm in the eigenbeam domain (i.e., spherical harmonics domain) (EB-MVDR) for sound source localization. Experimental results show that compared with the traditional EB-MVDR, the proposed sound source localization method can effectively improve the accuracy of multi-source localization in noisy and reverberant environments and has good performance under different numbers of sound sources.https://ieeexplore.ieee.org/document/10477500/High-order ambisonicsneural networksound source localization
spellingShingle Lezhong Wang
Leru Wang
Sound Source Localization in Spherical Harmonics Domain Based on High-Order Ambisonics Signals Enhancement Neural Network
IEEE Access
High-order ambisonics
neural network
sound source localization
title Sound Source Localization in Spherical Harmonics Domain Based on High-Order Ambisonics Signals Enhancement Neural Network
title_full Sound Source Localization in Spherical Harmonics Domain Based on High-Order Ambisonics Signals Enhancement Neural Network
title_fullStr Sound Source Localization in Spherical Harmonics Domain Based on High-Order Ambisonics Signals Enhancement Neural Network
title_full_unstemmed Sound Source Localization in Spherical Harmonics Domain Based on High-Order Ambisonics Signals Enhancement Neural Network
title_short Sound Source Localization in Spherical Harmonics Domain Based on High-Order Ambisonics Signals Enhancement Neural Network
title_sort sound source localization in spherical harmonics domain based on high order ambisonics signals enhancement neural network
topic High-order ambisonics
neural network
sound source localization
url https://ieeexplore.ieee.org/document/10477500/
work_keys_str_mv AT lezhongwang soundsourcelocalizationinsphericalharmonicsdomainbasedonhighorderambisonicssignalsenhancementneuralnetwork
AT leruwang soundsourcelocalizationinsphericalharmonicsdomainbasedonhighorderambisonicssignalsenhancementneuralnetwork