Double-Talk Detection-Aided Residual Echo Suppression via Spectrogram Masking and Refinement

Acoustic echo in full-duplex telecommunication systems is a common problem that may cause desired-speech quality degradation during double-talk periods. This problem is especially challenging in low signal-to-echo ratio (SER) scenarios, such as hands-free conversations over mobile phones when the lo...

Full description

Bibliographic Details
Main Authors: Eran Shachar, Israel Cohen, Baruch Berdugo
Format: Article
Language:English
Published: MDPI AG 2022-08-01
Series:Acoustics
Subjects:
Online Access:https://www.mdpi.com/2624-599X/4/3/39
_version_ 1797492512828424192
author Eran Shachar
Israel Cohen
Baruch Berdugo
author_facet Eran Shachar
Israel Cohen
Baruch Berdugo
author_sort Eran Shachar
collection DOAJ
description Acoustic echo in full-duplex telecommunication systems is a common problem that may cause desired-speech quality degradation during double-talk periods. This problem is especially challenging in low signal-to-echo ratio (SER) scenarios, such as hands-free conversations over mobile phones when the loudspeaker volume is high. This paper proposes a two-stage deep-learning approach to residual echo suppression focused on the low SER scenario. The first stage consists of a speech spectrogram masking model integrated with a double-talk detector (DTD). The second stage consists of a spectrogram refinement model optimized for speech quality by minimizing a perceptual evaluation of speech quality (PESQ) related loss function. The proposed integration of DTD with the masking model outperforms several other configurations based on previous studies. We conduct an ablation study that shows the contribution of each part of the proposed system. We evaluate the proposed system in several SERs and demonstrate its efficiency in the challenging setting of a very low SER. Finally, the proposed approach outperforms competing methods in several residual echo suppression metrics. We conclude that the proposed system is well-suited for the task of low SER residual echo suppression.
first_indexed 2024-03-10T01:04:44Z
format Article
id doaj.art-aa643fac09e946509002263c94730d69
institution Directory Open Access Journal
issn 2624-599X
language English
last_indexed 2024-03-10T01:04:44Z
publishDate 2022-08-01
publisher MDPI AG
record_format Article
series Acoustics
spelling doaj.art-aa643fac09e946509002263c94730d692023-11-23T14:28:52ZengMDPI AGAcoustics2624-599X2022-08-014363765510.3390/acoustics4030039Double-Talk Detection-Aided Residual Echo Suppression via Spectrogram Masking and RefinementEran Shachar0Israel Cohen1Baruch Berdugo2Andrew and Erna Viterbi Faculty of Electrical & Computer Engineering, Technion–Israel Institute of Technology, Technion City, Haifa 3200003, IsraelAndrew and Erna Viterbi Faculty of Electrical & Computer Engineering, Technion–Israel Institute of Technology, Technion City, Haifa 3200003, IsraelAndrew and Erna Viterbi Faculty of Electrical & Computer Engineering, Technion–Israel Institute of Technology, Technion City, Haifa 3200003, IsraelAcoustic echo in full-duplex telecommunication systems is a common problem that may cause desired-speech quality degradation during double-talk periods. This problem is especially challenging in low signal-to-echo ratio (SER) scenarios, such as hands-free conversations over mobile phones when the loudspeaker volume is high. This paper proposes a two-stage deep-learning approach to residual echo suppression focused on the low SER scenario. The first stage consists of a speech spectrogram masking model integrated with a double-talk detector (DTD). The second stage consists of a spectrogram refinement model optimized for speech quality by minimizing a perceptual evaluation of speech quality (PESQ) related loss function. The proposed integration of DTD with the masking model outperforms several other configurations based on previous studies. We conduct an ablation study that shows the contribution of each part of the proposed system. We evaluate the proposed system in several SERs and demonstrate its efficiency in the challenging setting of a very low SER. Finally, the proposed approach outperforms competing methods in several residual echo suppression metrics. We conclude that the proposed system is well-suited for the task of low SER residual echo suppression.https://www.mdpi.com/2624-599X/4/3/39residual echo suppressionacoustic echo cancellationdouble-talk detectiondeep-learning
spellingShingle Eran Shachar
Israel Cohen
Baruch Berdugo
Double-Talk Detection-Aided Residual Echo Suppression via Spectrogram Masking and Refinement
Acoustics
residual echo suppression
acoustic echo cancellation
double-talk detection
deep-learning
title Double-Talk Detection-Aided Residual Echo Suppression via Spectrogram Masking and Refinement
title_full Double-Talk Detection-Aided Residual Echo Suppression via Spectrogram Masking and Refinement
title_fullStr Double-Talk Detection-Aided Residual Echo Suppression via Spectrogram Masking and Refinement
title_full_unstemmed Double-Talk Detection-Aided Residual Echo Suppression via Spectrogram Masking and Refinement
title_short Double-Talk Detection-Aided Residual Echo Suppression via Spectrogram Masking and Refinement
title_sort double talk detection aided residual echo suppression via spectrogram masking and refinement
topic residual echo suppression
acoustic echo cancellation
double-talk detection
deep-learning
url https://www.mdpi.com/2624-599X/4/3/39
work_keys_str_mv AT eranshachar doubletalkdetectionaidedresidualechosuppressionviaspectrogrammaskingandrefinement
AT israelcohen doubletalkdetectionaidedresidualechosuppressionviaspectrogrammaskingandrefinement
AT baruchberdugo doubletalkdetectionaidedresidualechosuppressionviaspectrogrammaskingandrefinement