A Multi-Stage Acoustic Echo Cancellation Model Based on Adaptive Filters and Deep Neural Networks
The presence of a large amount of echoes significantly impairs the quality and intelligibility of speech during communication. To address this issue, numerous studies and models have been conducted to cancel echo. In this study, we propose a multi-stage acoustic echo cancellation model that utilizes...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-07-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/12/15/3258 |
_version_ | 1797586867221168128 |
---|---|
author | Shiyun Xu Changjun He Bosong Yan Mingjiang Wang |
author_facet | Shiyun Xu Changjun He Bosong Yan Mingjiang Wang |
author_sort | Shiyun Xu |
collection | DOAJ |
description | The presence of a large amount of echoes significantly impairs the quality and intelligibility of speech during communication. To address this issue, numerous studies and models have been conducted to cancel echo. In this study, we propose a multi-stage acoustic echo cancellation model that utilizes an adaptive filter and a deep neural network. Our model consists of two parts: the Speex algorithm for canceling linear echo, and the multi-scale time-frequency UNet (MSTFUNet) for further echo cancellation. The Speex algorithm takes the far-end reference speech and the near-end microphone signal as inputs, and outputs the signal after linear echo cancellation. MSTFUNet takes the spectra of the far-end reference speech, the near-end microphone signal, and the output of Speex as inputs, and generates the estimated near-end speech spectrum as output. To enhance the performance of the Speex algorithm, we conduct delay estimation and compensation to the far-end reference speech. For MSTFUNet, we employ multi-scale time-frequency processing to extract information from the input spectrum. Additionally, we incorporate an improved time-frequency self-attention to capture time-frequency information. Furthermore, we introduce channel time-frequency attention to alleviate information loss during downsampling and upsampling. In our experiments, we evaluate the performance of our proposed model on both our test set and the blind test set of the Acoustic Echo Cancellation challenge. Our proposed model exhibits superior performance in terms of acoustic echo cancellation and noise reverberation suppression compared to other models. |
first_indexed | 2024-03-11T00:29:15Z |
format | Article |
id | doaj.art-3d2b8e6cd9c34a35a5a59a7574ab8bc3 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-11T00:29:15Z |
publishDate | 2023-07-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-3d2b8e6cd9c34a35a5a59a7574ab8bc32023-11-18T22:48:36ZengMDPI AGElectronics2079-92922023-07-011215325810.3390/electronics12153258A Multi-Stage Acoustic Echo Cancellation Model Based on Adaptive Filters and Deep Neural NetworksShiyun Xu0Changjun He1Bosong Yan2Mingjiang Wang3Key Laboratory for Key Technologies of IoT Terminals, Harbin Institute of Technology, Shenzhen 518055, ChinaKey Laboratory for Key Technologies of IoT Terminals, Harbin Institute of Technology, Shenzhen 518055, ChinaKey Laboratory for Key Technologies of IoT Terminals, Harbin Institute of Technology, Shenzhen 518055, ChinaKey Laboratory for Key Technologies of IoT Terminals, Harbin Institute of Technology, Shenzhen 518055, ChinaThe presence of a large amount of echoes significantly impairs the quality and intelligibility of speech during communication. To address this issue, numerous studies and models have been conducted to cancel echo. In this study, we propose a multi-stage acoustic echo cancellation model that utilizes an adaptive filter and a deep neural network. Our model consists of two parts: the Speex algorithm for canceling linear echo, and the multi-scale time-frequency UNet (MSTFUNet) for further echo cancellation. The Speex algorithm takes the far-end reference speech and the near-end microphone signal as inputs, and outputs the signal after linear echo cancellation. MSTFUNet takes the spectra of the far-end reference speech, the near-end microphone signal, and the output of Speex as inputs, and generates the estimated near-end speech spectrum as output. To enhance the performance of the Speex algorithm, we conduct delay estimation and compensation to the far-end reference speech. For MSTFUNet, we employ multi-scale time-frequency processing to extract information from the input spectrum. Additionally, we incorporate an improved time-frequency self-attention to capture time-frequency information. Furthermore, we introduce channel time-frequency attention to alleviate information loss during downsampling and upsampling. In our experiments, we evaluate the performance of our proposed model on both our test set and the blind test set of the Acoustic Echo Cancellation challenge. Our proposed model exhibits superior performance in terms of acoustic echo cancellation and noise reverberation suppression compared to other models.https://www.mdpi.com/2079-9292/12/15/3258acoustic echo cancellationmulti-stage modeladaptive filterdeep neural network |
spellingShingle | Shiyun Xu Changjun He Bosong Yan Mingjiang Wang A Multi-Stage Acoustic Echo Cancellation Model Based on Adaptive Filters and Deep Neural Networks Electronics acoustic echo cancellation multi-stage model adaptive filter deep neural network |
title | A Multi-Stage Acoustic Echo Cancellation Model Based on Adaptive Filters and Deep Neural Networks |
title_full | A Multi-Stage Acoustic Echo Cancellation Model Based on Adaptive Filters and Deep Neural Networks |
title_fullStr | A Multi-Stage Acoustic Echo Cancellation Model Based on Adaptive Filters and Deep Neural Networks |
title_full_unstemmed | A Multi-Stage Acoustic Echo Cancellation Model Based on Adaptive Filters and Deep Neural Networks |
title_short | A Multi-Stage Acoustic Echo Cancellation Model Based on Adaptive Filters and Deep Neural Networks |
title_sort | multi stage acoustic echo cancellation model based on adaptive filters and deep neural networks |
topic | acoustic echo cancellation multi-stage model adaptive filter deep neural network |
url | https://www.mdpi.com/2079-9292/12/15/3258 |
work_keys_str_mv | AT shiyunxu amultistageacousticechocancellationmodelbasedonadaptivefiltersanddeepneuralnetworks AT changjunhe amultistageacousticechocancellationmodelbasedonadaptivefiltersanddeepneuralnetworks AT bosongyan amultistageacousticechocancellationmodelbasedonadaptivefiltersanddeepneuralnetworks AT mingjiangwang amultistageacousticechocancellationmodelbasedonadaptivefiltersanddeepneuralnetworks AT shiyunxu multistageacousticechocancellationmodelbasedonadaptivefiltersanddeepneuralnetworks AT changjunhe multistageacousticechocancellationmodelbasedonadaptivefiltersanddeepneuralnetworks AT bosongyan multistageacousticechocancellationmodelbasedonadaptivefiltersanddeepneuralnetworks AT mingjiangwang multistageacousticechocancellationmodelbasedonadaptivefiltersanddeepneuralnetworks |