Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion

Accurate time delay estimation is critical in sound source localization methods that rely on time difference of arrival. Background noise and reverberation often introduce errors in time delay estimation. Generalized cross-correlation (GCC) functions, paired with different weighting functions, can a...

Full description

Bibliographic Details
Main Authors: Haitao Liu, Xiuliang Zhang, Penggao Li, Yu Yao, Sheng Zhang, Qian Xiao
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10345585/
_version_ 1797376321155760128
author Haitao Liu
Xiuliang Zhang
Penggao Li
Yu Yao
Sheng Zhang
Qian Xiao
author_facet Haitao Liu
Xiuliang Zhang
Penggao Li
Yu Yao
Sheng Zhang
Qian Xiao
author_sort Haitao Liu
collection DOAJ
description Accurate time delay estimation is critical in sound source localization methods that rely on time difference of arrival. Background noise and reverberation often introduce errors in time delay estimation. Generalized cross-correlation (GCC) functions, paired with different weighting functions, can adapt to various sound field environments for time delay estimation. To create a highly accurate time delay estimation method suitable for universal sound field conditions, this paper proposes a novel approach, which involves training multi-class weighted generalized cross-correlation features using a convolutional neural network. Various weighted GCC functions are employed to extract time delay features for the same microphone pairs. These time delay features from multi-class weighted GCC are fused to create a feature matrix. The feature matrix is then input into a convolutional neural network composed of convolutional layers and fully connected layers for training and prediction. In the network, time delay estimation is achieved using two different methods: regression and classification, with mean squared error and cross-entropy serving as loss functions, respectively. The proposed method is tested and validated through simulation scenarios featuring various signal-to-noise ratios and reverberation conditions. Time delay estimation results are compared with recent state-of-the-art (SOTA) methods, assessing accuracy, root mean square error, and mean absolute error. The results demonstrate that the proposed method achieves an impressive 3.36% enhancement in overall delay estimation accuracy (within 10cm), reduces the absolute error by 11.53%, and significantly decreases the estimated root mean square error by 16.07% compared to existing SOTA methods. Furthermore, the proposed model offers the advantages of compact size and efficient computational performance when compared to existing methods. These findings underscore the exceptional comprehensive performance of the proposed model in sound source localization applications.
first_indexed 2024-03-08T19:36:51Z
format Article
id doaj.art-efa6d36ae1604a00a7d6f161635ad988
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-08T19:36:51Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-efa6d36ae1604a00a7d6f161635ad9882023-12-26T00:12:11ZengIEEEIEEE Access2169-35362023-01-011114078914080010.1109/ACCESS.2023.334010810345585Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature FusionHaitao Liu0https://orcid.org/0000-0002-3052-8709Xiuliang Zhang1Penggao Li2Yu Yao3Sheng Zhang4Qian Xiao5School of Mechanotronics and Vehicle Engineering, East China Jiaotong University, Nanchang, ChinaSchool of Mechanotronics and Vehicle Engineering, East China Jiaotong University, Nanchang, ChinaSchool of Mechanotronics and Vehicle Engineering, East China Jiaotong University, Nanchang, ChinaSchool of Information and Telecommunications Engineering, Hainan University, Haikou, ChinaSuzhou Acoustic Technology Institute Company Ltd., Suzhou, ChinaSchool of Mechanotronics and Vehicle Engineering, East China Jiaotong University, Nanchang, ChinaAccurate time delay estimation is critical in sound source localization methods that rely on time difference of arrival. Background noise and reverberation often introduce errors in time delay estimation. Generalized cross-correlation (GCC) functions, paired with different weighting functions, can adapt to various sound field environments for time delay estimation. To create a highly accurate time delay estimation method suitable for universal sound field conditions, this paper proposes a novel approach, which involves training multi-class weighted generalized cross-correlation features using a convolutional neural network. Various weighted GCC functions are employed to extract time delay features for the same microphone pairs. These time delay features from multi-class weighted GCC are fused to create a feature matrix. The feature matrix is then input into a convolutional neural network composed of convolutional layers and fully connected layers for training and prediction. In the network, time delay estimation is achieved using two different methods: regression and classification, with mean squared error and cross-entropy serving as loss functions, respectively. The proposed method is tested and validated through simulation scenarios featuring various signal-to-noise ratios and reverberation conditions. Time delay estimation results are compared with recent state-of-the-art (SOTA) methods, assessing accuracy, root mean square error, and mean absolute error. The results demonstrate that the proposed method achieves an impressive 3.36% enhancement in overall delay estimation accuracy (within 10cm), reduces the absolute error by 11.53%, and significantly decreases the estimated root mean square error by 16.07% compared to existing SOTA methods. Furthermore, the proposed model offers the advantages of compact size and efficient computational performance when compared to existing methods. These findings underscore the exceptional comprehensive performance of the proposed model in sound source localization applications.https://ieeexplore.ieee.org/document/10345585/Sound source localizationtime delay estimationgeneralized cross-correlationconvolutional neural networkfeature fusion
spellingShingle Haitao Liu
Xiuliang Zhang
Penggao Li
Yu Yao
Sheng Zhang
Qian Xiao
Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion
IEEE Access
Sound source localization
time delay estimation
generalized cross-correlation
convolutional neural network
feature fusion
title Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion
title_full Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion
title_fullStr Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion
title_full_unstemmed Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion
title_short Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion
title_sort time delay estimation for sound source localization using cnn based multi gcc feature fusion
topic Sound source localization
time delay estimation
generalized cross-correlation
convolutional neural network
feature fusion
url https://ieeexplore.ieee.org/document/10345585/
work_keys_str_mv AT haitaoliu timedelayestimationforsoundsourcelocalizationusingcnnbasedmultigccfeaturefusion
AT xiuliangzhang timedelayestimationforsoundsourcelocalizationusingcnnbasedmultigccfeaturefusion
AT penggaoli timedelayestimationforsoundsourcelocalizationusingcnnbasedmultigccfeaturefusion
AT yuyao timedelayestimationforsoundsourcelocalizationusingcnnbasedmultigccfeaturefusion
AT shengzhang timedelayestimationforsoundsourcelocalizationusingcnnbasedmultigccfeaturefusion
AT qianxiao timedelayestimationforsoundsourcelocalizationusingcnnbasedmultigccfeaturefusion