Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion

Accurate time delay estimation is critical in sound source localization methods that rely on time difference of arrival. Background noise and reverberation often introduce errors in time delay estimation. Generalized cross-correlation (GCC) functions, paired with different weighting functions, can a...

Full description

Bibliographic Details
Main Authors:	Haitao Liu, Xiuliang Zhang, Penggao Li, Yu Yao, Sheng Zhang, Qian Xiao
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Sound source localization time delay estimation generalized cross-correlation convolutional neural network feature fusion
Online Access:	https://ieeexplore.ieee.org/document/10345585/

_version_	1797376321155760128
author	Haitao Liu Xiuliang Zhang Penggao Li Yu Yao Sheng Zhang Qian Xiao
author_facet	Haitao Liu Xiuliang Zhang Penggao Li Yu Yao Sheng Zhang Qian Xiao
author_sort	Haitao Liu
collection	DOAJ
description	Accurate time delay estimation is critical in sound source localization methods that rely on time difference of arrival. Background noise and reverberation often introduce errors in time delay estimation. Generalized cross-correlation (GCC) functions, paired with different weighting functions, can adapt to various sound field environments for time delay estimation. To create a highly accurate time delay estimation method suitable for universal sound field conditions, this paper proposes a novel approach, which involves training multi-class weighted generalized cross-correlation features using a convolutional neural network. Various weighted GCC functions are employed to extract time delay features for the same microphone pairs. These time delay features from multi-class weighted GCC are fused to create a feature matrix. The feature matrix is then input into a convolutional neural network composed of convolutional layers and fully connected layers for training and prediction. In the network, time delay estimation is achieved using two different methods: regression and classification, with mean squared error and cross-entropy serving as loss functions, respectively. The proposed method is tested and validated through simulation scenarios featuring various signal-to-noise ratios and reverberation conditions. Time delay estimation results are compared with recent state-of-the-art (SOTA) methods, assessing accuracy, root mean square error, and mean absolute error. The results demonstrate that the proposed method achieves an impressive 3.36% enhancement in overall delay estimation accuracy (within 10cm), reduces the absolute error by 11.53%, and significantly decreases the estimated root mean square error by 16.07% compared to existing SOTA methods. Furthermore, the proposed model offers the advantages of compact size and efficient computational performance when compared to existing methods. These findings underscore the exceptional comprehensive performance of the proposed model in sound source localization applications.
first_indexed	2024-03-08T19:36:51Z
format	Article
id	doaj.art-efa6d36ae1604a00a7d6f161635ad988
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-08T19:36:51Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-efa6d36ae1604a00a7d6f161635ad9882023-12-26T00:12:11ZengIEEEIEEE Access2169-35362023-01-011114078914080010.1109/ACCESS.2023.334010810345585Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature FusionHaitao Liu0https://orcid.org/0000-0002-3052-8709Xiuliang Zhang1Penggao Li2Yu Yao3Sheng Zhang4Qian Xiao5School of Mechanotronics and Vehicle Engineering, East China Jiaotong University, Nanchang, ChinaSchool of Mechanotronics and Vehicle Engineering, East China Jiaotong University, Nanchang, ChinaSchool of Mechanotronics and Vehicle Engineering, East China Jiaotong University, Nanchang, ChinaSchool of Information and Telecommunications Engineering, Hainan University, Haikou, ChinaSuzhou Acoustic Technology Institute Company Ltd., Suzhou, ChinaSchool of Mechanotronics and Vehicle Engineering, East China Jiaotong University, Nanchang, ChinaAccurate time delay estimation is critical in sound source localization methods that rely on time difference of arrival. Background noise and reverberation often introduce errors in time delay estimation. Generalized cross-correlation (GCC) functions, paired with different weighting functions, can adapt to various sound field environments for time delay estimation. To create a highly accurate time delay estimation method suitable for universal sound field conditions, this paper proposes a novel approach, which involves training multi-class weighted generalized cross-correlation features using a convolutional neural network. Various weighted GCC functions are employed to extract time delay features for the same microphone pairs. These time delay features from multi-class weighted GCC are fused to create a feature matrix. The feature matrix is then input into a convolutional neural network composed of convolutional layers and fully connected layers for training and prediction. In the network, time delay estimation is achieved using two different methods: regression and classification, with mean squared error and cross-entropy serving as loss functions, respectively. The proposed method is tested and validated through simulation scenarios featuring various signal-to-noise ratios and reverberation conditions. Time delay estimation results are compared with recent state-of-the-art (SOTA) methods, assessing accuracy, root mean square error, and mean absolute error. The results demonstrate that the proposed method achieves an impressive 3.36% enhancement in overall delay estimation accuracy (within 10cm), reduces the absolute error by 11.53%, and significantly decreases the estimated root mean square error by 16.07% compared to existing SOTA methods. Furthermore, the proposed model offers the advantages of compact size and efficient computational performance when compared to existing methods. These findings underscore the exceptional comprehensive performance of the proposed model in sound source localization applications.https://ieeexplore.ieee.org/document/10345585/Sound source localizationtime delay estimationgeneralized cross-correlationconvolutional neural networkfeature fusion
spellingShingle	Haitao Liu Xiuliang Zhang Penggao Li Yu Yao Sheng Zhang Qian Xiao Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion IEEE Access Sound source localization time delay estimation generalized cross-correlation convolutional neural network feature fusion
title	Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion
title_full	Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion
title_fullStr	Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion
title_full_unstemmed	Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion
title_short	Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion
title_sort	time delay estimation for sound source localization using cnn based multi gcc feature fusion
topic	Sound source localization time delay estimation generalized cross-correlation convolutional neural network feature fusion
url	https://ieeexplore.ieee.org/document/10345585/
work_keys_str_mv	AT haitaoliu timedelayestimationforsoundsourcelocalizationusingcnnbasedmultigccfeaturefusion AT xiuliangzhang timedelayestimationforsoundsourcelocalizationusingcnnbasedmultigccfeaturefusion AT penggaoli timedelayestimationforsoundsourcelocalizationusingcnnbasedmultigccfeaturefusion AT yuyao timedelayestimationforsoundsourcelocalizationusingcnnbasedmultigccfeaturefusion AT shengzhang timedelayestimationforsoundsourcelocalizationusingcnnbasedmultigccfeaturefusion AT qianxiao timedelayestimationforsoundsourcelocalizationusingcnnbasedmultigccfeaturefusion

Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion

Similar Items