Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion
Accurate time delay estimation is critical in sound source localization methods that rely on time difference of arrival. Background noise and reverberation often introduce errors in time delay estimation. Generalized cross-correlation (GCC) functions, paired with different weighting functions, can a...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10345585/ |
_version_ | 1797376321155760128 |
---|---|
author | Haitao Liu Xiuliang Zhang Penggao Li Yu Yao Sheng Zhang Qian Xiao |
author_facet | Haitao Liu Xiuliang Zhang Penggao Li Yu Yao Sheng Zhang Qian Xiao |
author_sort | Haitao Liu |
collection | DOAJ |
description | Accurate time delay estimation is critical in sound source localization methods that rely on time difference of arrival. Background noise and reverberation often introduce errors in time delay estimation. Generalized cross-correlation (GCC) functions, paired with different weighting functions, can adapt to various sound field environments for time delay estimation. To create a highly accurate time delay estimation method suitable for universal sound field conditions, this paper proposes a novel approach, which involves training multi-class weighted generalized cross-correlation features using a convolutional neural network. Various weighted GCC functions are employed to extract time delay features for the same microphone pairs. These time delay features from multi-class weighted GCC are fused to create a feature matrix. The feature matrix is then input into a convolutional neural network composed of convolutional layers and fully connected layers for training and prediction. In the network, time delay estimation is achieved using two different methods: regression and classification, with mean squared error and cross-entropy serving as loss functions, respectively. The proposed method is tested and validated through simulation scenarios featuring various signal-to-noise ratios and reverberation conditions. Time delay estimation results are compared with recent state-of-the-art (SOTA) methods, assessing accuracy, root mean square error, and mean absolute error. The results demonstrate that the proposed method achieves an impressive 3.36% enhancement in overall delay estimation accuracy (within 10cm), reduces the absolute error by 11.53%, and significantly decreases the estimated root mean square error by 16.07% compared to existing SOTA methods. Furthermore, the proposed model offers the advantages of compact size and efficient computational performance when compared to existing methods. These findings underscore the exceptional comprehensive performance of the proposed model in sound source localization applications. |
first_indexed | 2024-03-08T19:36:51Z |
format | Article |
id | doaj.art-efa6d36ae1604a00a7d6f161635ad988 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-08T19:36:51Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-efa6d36ae1604a00a7d6f161635ad9882023-12-26T00:12:11ZengIEEEIEEE Access2169-35362023-01-011114078914080010.1109/ACCESS.2023.334010810345585Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature FusionHaitao Liu0https://orcid.org/0000-0002-3052-8709Xiuliang Zhang1Penggao Li2Yu Yao3Sheng Zhang4Qian Xiao5School of Mechanotronics and Vehicle Engineering, East China Jiaotong University, Nanchang, ChinaSchool of Mechanotronics and Vehicle Engineering, East China Jiaotong University, Nanchang, ChinaSchool of Mechanotronics and Vehicle Engineering, East China Jiaotong University, Nanchang, ChinaSchool of Information and Telecommunications Engineering, Hainan University, Haikou, ChinaSuzhou Acoustic Technology Institute Company Ltd., Suzhou, ChinaSchool of Mechanotronics and Vehicle Engineering, East China Jiaotong University, Nanchang, ChinaAccurate time delay estimation is critical in sound source localization methods that rely on time difference of arrival. Background noise and reverberation often introduce errors in time delay estimation. Generalized cross-correlation (GCC) functions, paired with different weighting functions, can adapt to various sound field environments for time delay estimation. To create a highly accurate time delay estimation method suitable for universal sound field conditions, this paper proposes a novel approach, which involves training multi-class weighted generalized cross-correlation features using a convolutional neural network. Various weighted GCC functions are employed to extract time delay features for the same microphone pairs. These time delay features from multi-class weighted GCC are fused to create a feature matrix. The feature matrix is then input into a convolutional neural network composed of convolutional layers and fully connected layers for training and prediction. In the network, time delay estimation is achieved using two different methods: regression and classification, with mean squared error and cross-entropy serving as loss functions, respectively. The proposed method is tested and validated through simulation scenarios featuring various signal-to-noise ratios and reverberation conditions. Time delay estimation results are compared with recent state-of-the-art (SOTA) methods, assessing accuracy, root mean square error, and mean absolute error. The results demonstrate that the proposed method achieves an impressive 3.36% enhancement in overall delay estimation accuracy (within 10cm), reduces the absolute error by 11.53%, and significantly decreases the estimated root mean square error by 16.07% compared to existing SOTA methods. Furthermore, the proposed model offers the advantages of compact size and efficient computational performance when compared to existing methods. These findings underscore the exceptional comprehensive performance of the proposed model in sound source localization applications.https://ieeexplore.ieee.org/document/10345585/Sound source localizationtime delay estimationgeneralized cross-correlationconvolutional neural networkfeature fusion |
spellingShingle | Haitao Liu Xiuliang Zhang Penggao Li Yu Yao Sheng Zhang Qian Xiao Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion IEEE Access Sound source localization time delay estimation generalized cross-correlation convolutional neural network feature fusion |
title | Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion |
title_full | Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion |
title_fullStr | Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion |
title_full_unstemmed | Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion |
title_short | Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion |
title_sort | time delay estimation for sound source localization using cnn based multi gcc feature fusion |
topic | Sound source localization time delay estimation generalized cross-correlation convolutional neural network feature fusion |
url | https://ieeexplore.ieee.org/document/10345585/ |
work_keys_str_mv | AT haitaoliu timedelayestimationforsoundsourcelocalizationusingcnnbasedmultigccfeaturefusion AT xiuliangzhang timedelayestimationforsoundsourcelocalizationusingcnnbasedmultigccfeaturefusion AT penggaoli timedelayestimationforsoundsourcelocalizationusingcnnbasedmultigccfeaturefusion AT yuyao timedelayestimationforsoundsourcelocalizationusingcnnbasedmultigccfeaturefusion AT shengzhang timedelayestimationforsoundsourcelocalizationusingcnnbasedmultigccfeaturefusion AT qianxiao timedelayestimationforsoundsourcelocalizationusingcnnbasedmultigccfeaturefusion |