Speaker and phoneme-aware speech bandwidth extension with residual dual-path network

Speech bandwidth extension aims to generate a wideband signal from a narrowband (low-band) input by predicting the missing high-frequency components. It is believed that the general knowledge about the speaker and phonetic content strengthens the prediction. In this paper, we propose to augment the...

Full description

Bibliographic Details
Main Authors: Hou, Nana, Xu, Chenglin, Pham, Van Tung, Zhou, Joey Tianyi, Chng, Eng Siong, Li, Haizhou
Other Authors: School of Computer Science and Engineering
Format: Conference Paper
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/144854
_version_ 1811687265857961984
author Hou, Nana
Xu, Chenglin
Pham, Van Tung
Zhou, Joey Tianyi
Chng, Eng Siong
Li, Haizhou
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Hou, Nana
Xu, Chenglin
Pham, Van Tung
Zhou, Joey Tianyi
Chng, Eng Siong
Li, Haizhou
author_sort Hou, Nana
collection NTU
description Speech bandwidth extension aims to generate a wideband signal from a narrowband (low-band) input by predicting the missing high-frequency components. It is believed that the general knowledge about the speaker and phonetic content strengthens the prediction. In this paper, we propose to augment the low-band acoustic features with i-vector and phonetic posteriorgram (PPG), which represent speaker and phonetic content of the speech, respectively. We also propose a residual dual-path network (RDPN) as the core module to process the augmented features, which fully utilizes the utterance-level temporal continuity information and avoids gradient vanishing. Experiments show that the proposed method achieves 20.2% and 7.0% relative improvements over the best baseline in terms of log-spectral distortion (LSD) and signal-to-noise ratio (SNR), respectively. Furthermore, our method is 16 times more compact than the best baseline in terms of the number of parameters.
first_indexed 2024-10-01T05:13:34Z
format Conference Paper
id ntu-10356/144854
institution Nanyang Technological University
language English
last_indexed 2024-10-01T05:13:34Z
publishDate 2020
record_format dspace
spelling ntu-10356/1448542020-12-05T20:10:23Z Speaker and phoneme-aware speech bandwidth extension with residual dual-path network Hou, Nana Xu, Chenglin Pham, Van Tung Zhou, Joey Tianyi Chng, Eng Siong Li, Haizhou School of Computer Science and Engineering Interspeech 2020 Air Traffic Management Research Institute Engineering::Computer science and engineering Speech Enhancement Speech Bandwidth Extension Speech bandwidth extension aims to generate a wideband signal from a narrowband (low-band) input by predicting the missing high-frequency components. It is believed that the general knowledge about the speaker and phonetic content strengthens the prediction. In this paper, we propose to augment the low-band acoustic features with i-vector and phonetic posteriorgram (PPG), which represent speaker and phonetic content of the speech, respectively. We also propose a residual dual-path network (RDPN) as the core module to process the augmented features, which fully utilizes the utterance-level temporal continuity information and avoids gradient vanishing. Experiments show that the proposed method achieves 20.2% and 7.0% relative improvements over the best baseline in terms of log-spectral distortion (LSD) and signal-to-noise ratio (SNR), respectively. Furthermore, our method is 16 times more compact than the best baseline in terms of the number of parameters. National Research Foundation (NRF) Published version This work was supported by Air Traffic Management Research Institute of Nanyang Technological University, Human- Robot Interaction Phase 1 (Grant No. 192 25 00054), National Research Foundation (NRF) Singapore under the National Robotics Programme; AI Speech Lab (Award No. AISG- 100E-2018-006), NRF Singapore under the AI Singapore Programme; Human Robot Collaborative AI for AME (Grant No. A18A2b0046), NRF Singapore; Neuromorphic Computing Programme (Grant No. A1687b0033), RIE 2020 AME Programmatic Grant. The work by H. Li is also partly supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy (University Allowance, EXC 2077, University of Bremen, Germany). 2020-11-30T07:17:30Z 2020-11-30T07:17:30Z 2020 Conference Paper Hou, N., Xu, C., Pham, V. T., Zhou, J. T., Chng, E. S., & Li, H. (2020). Speaker and phoneme-aware speech bandwidth extension with residual dual-path network. Interspeech 2020, 4064-4068. https://hdl.handle.net/10356/144854 4064 4068 en © 2020 International Speech Communication Association (ISCA). All rights reserved. This paper was published in Interspeech 2020 and is made available with permission of International Speech Communication Association (ISCA). application/pdf
spellingShingle Engineering::Computer science and engineering
Speech Enhancement
Speech Bandwidth Extension
Hou, Nana
Xu, Chenglin
Pham, Van Tung
Zhou, Joey Tianyi
Chng, Eng Siong
Li, Haizhou
Speaker and phoneme-aware speech bandwidth extension with residual dual-path network
title Speaker and phoneme-aware speech bandwidth extension with residual dual-path network
title_full Speaker and phoneme-aware speech bandwidth extension with residual dual-path network
title_fullStr Speaker and phoneme-aware speech bandwidth extension with residual dual-path network
title_full_unstemmed Speaker and phoneme-aware speech bandwidth extension with residual dual-path network
title_short Speaker and phoneme-aware speech bandwidth extension with residual dual-path network
title_sort speaker and phoneme aware speech bandwidth extension with residual dual path network
topic Engineering::Computer science and engineering
Speech Enhancement
Speech Bandwidth Extension
url https://hdl.handle.net/10356/144854
work_keys_str_mv AT hounana speakerandphonemeawarespeechbandwidthextensionwithresidualdualpathnetwork
AT xuchenglin speakerandphonemeawarespeechbandwidthextensionwithresidualdualpathnetwork
AT phamvantung speakerandphonemeawarespeechbandwidthextensionwithresidualdualpathnetwork
AT zhoujoeytianyi speakerandphonemeawarespeechbandwidthextensionwithresidualdualpathnetwork
AT chngengsiong speakerandphonemeawarespeechbandwidthextensionwithresidualdualpathnetwork
AT lihaizhou speakerandphonemeawarespeechbandwidthextensionwithresidualdualpathnetwork