Double-Talk Detection-Aided Residual Echo Suppression via Spectrogram Masking and Refinement
Acoustic echo in full-duplex telecommunication systems is a common problem that may cause desired-speech quality degradation during double-talk periods. This problem is especially challenging in low signal-to-echo ratio (SER) scenarios, such as hands-free conversations over mobile phones when the lo...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-08-01
|
Series: | Acoustics |
Subjects: | |
Online Access: | https://www.mdpi.com/2624-599X/4/3/39 |
_version_ | 1797492512828424192 |
---|---|
author | Eran Shachar Israel Cohen Baruch Berdugo |
author_facet | Eran Shachar Israel Cohen Baruch Berdugo |
author_sort | Eran Shachar |
collection | DOAJ |
description | Acoustic echo in full-duplex telecommunication systems is a common problem that may cause desired-speech quality degradation during double-talk periods. This problem is especially challenging in low signal-to-echo ratio (SER) scenarios, such as hands-free conversations over mobile phones when the loudspeaker volume is high. This paper proposes a two-stage deep-learning approach to residual echo suppression focused on the low SER scenario. The first stage consists of a speech spectrogram masking model integrated with a double-talk detector (DTD). The second stage consists of a spectrogram refinement model optimized for speech quality by minimizing a perceptual evaluation of speech quality (PESQ) related loss function. The proposed integration of DTD with the masking model outperforms several other configurations based on previous studies. We conduct an ablation study that shows the contribution of each part of the proposed system. We evaluate the proposed system in several SERs and demonstrate its efficiency in the challenging setting of a very low SER. Finally, the proposed approach outperforms competing methods in several residual echo suppression metrics. We conclude that the proposed system is well-suited for the task of low SER residual echo suppression. |
first_indexed | 2024-03-10T01:04:44Z |
format | Article |
id | doaj.art-aa643fac09e946509002263c94730d69 |
institution | Directory Open Access Journal |
issn | 2624-599X |
language | English |
last_indexed | 2024-03-10T01:04:44Z |
publishDate | 2022-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Acoustics |
spelling | doaj.art-aa643fac09e946509002263c94730d692023-11-23T14:28:52ZengMDPI AGAcoustics2624-599X2022-08-014363765510.3390/acoustics4030039Double-Talk Detection-Aided Residual Echo Suppression via Spectrogram Masking and RefinementEran Shachar0Israel Cohen1Baruch Berdugo2Andrew and Erna Viterbi Faculty of Electrical & Computer Engineering, Technion–Israel Institute of Technology, Technion City, Haifa 3200003, IsraelAndrew and Erna Viterbi Faculty of Electrical & Computer Engineering, Technion–Israel Institute of Technology, Technion City, Haifa 3200003, IsraelAndrew and Erna Viterbi Faculty of Electrical & Computer Engineering, Technion–Israel Institute of Technology, Technion City, Haifa 3200003, IsraelAcoustic echo in full-duplex telecommunication systems is a common problem that may cause desired-speech quality degradation during double-talk periods. This problem is especially challenging in low signal-to-echo ratio (SER) scenarios, such as hands-free conversations over mobile phones when the loudspeaker volume is high. This paper proposes a two-stage deep-learning approach to residual echo suppression focused on the low SER scenario. The first stage consists of a speech spectrogram masking model integrated with a double-talk detector (DTD). The second stage consists of a spectrogram refinement model optimized for speech quality by minimizing a perceptual evaluation of speech quality (PESQ) related loss function. The proposed integration of DTD with the masking model outperforms several other configurations based on previous studies. We conduct an ablation study that shows the contribution of each part of the proposed system. We evaluate the proposed system in several SERs and demonstrate its efficiency in the challenging setting of a very low SER. Finally, the proposed approach outperforms competing methods in several residual echo suppression metrics. We conclude that the proposed system is well-suited for the task of low SER residual echo suppression.https://www.mdpi.com/2624-599X/4/3/39residual echo suppressionacoustic echo cancellationdouble-talk detectiondeep-learning |
spellingShingle | Eran Shachar Israel Cohen Baruch Berdugo Double-Talk Detection-Aided Residual Echo Suppression via Spectrogram Masking and Refinement Acoustics residual echo suppression acoustic echo cancellation double-talk detection deep-learning |
title | Double-Talk Detection-Aided Residual Echo Suppression via Spectrogram Masking and Refinement |
title_full | Double-Talk Detection-Aided Residual Echo Suppression via Spectrogram Masking and Refinement |
title_fullStr | Double-Talk Detection-Aided Residual Echo Suppression via Spectrogram Masking and Refinement |
title_full_unstemmed | Double-Talk Detection-Aided Residual Echo Suppression via Spectrogram Masking and Refinement |
title_short | Double-Talk Detection-Aided Residual Echo Suppression via Spectrogram Masking and Refinement |
title_sort | double talk detection aided residual echo suppression via spectrogram masking and refinement |
topic | residual echo suppression acoustic echo cancellation double-talk detection deep-learning |
url | https://www.mdpi.com/2624-599X/4/3/39 |
work_keys_str_mv | AT eranshachar doubletalkdetectionaidedresidualechosuppressionviaspectrogrammaskingandrefinement AT israelcohen doubletalkdetectionaidedresidualechosuppressionviaspectrogrammaskingandrefinement AT baruchberdugo doubletalkdetectionaidedresidualechosuppressionviaspectrogrammaskingandrefinement |