End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement

Because of their simple design structure, end-to-end deep learning (E2E-DL) models have gained a lot of attention for speech enhancement. A number of DL models have achieved excellent results in eliminating the background noise and enhancing the quality as well as the intelligibility of noisy speech...

Full description

Bibliographic Details
Main Authors:	Rizwan Ullah, Lunchakorn Wuttisittikulkij, Sushank Chaudhary, Amir Parnianifard, Shashi Shah, Muhammad Ibrar, Fazal-E Wahab
Format:	Article
Language:	English
Published:	MDPI AG 2022-10-01
Series:	Sensors
Subjects:	E2E speech processing Convolutional Encode-Decoder Convolutional Recurrent Network speech quality intelligibility
Online Access:	https://www.mdpi.com/1424-8220/22/20/7782

_version_	1797470019090644992
author	Rizwan Ullah Lunchakorn Wuttisittikulkij Sushank Chaudhary Amir Parnianifard Shashi Shah Muhammad Ibrar Fazal-E Wahab
author_facet	Rizwan Ullah Lunchakorn Wuttisittikulkij Sushank Chaudhary Amir Parnianifard Shashi Shah Muhammad Ibrar Fazal-E Wahab
author_sort	Rizwan Ullah
collection	DOAJ
description	Because of their simple design structure, end-to-end deep learning (E2E-DL) models have gained a lot of attention for speech enhancement. A number of DL models have achieved excellent results in eliminating the background noise and enhancing the quality as well as the intelligibility of noisy speech. Designing resource-efficient and compact models during real-time processing is still a key challenge. In order to enhance the accomplishment of E2E models, the sequential and local characteristics of speech signal should be efficiently taken into consideration while modeling. In this paper, we present resource-efficient and compact neural models for end-to-end noise-robust waveform-based speech enhancement. Combining the Convolutional Encode-Decoder (CED) and Recurrent Neural Networks (RNNs) in the Convolutional Recurrent Network (CRN) framework, we have aimed at different speech enhancement systems. Different noise types and speakers are used to train and test the proposed models. With LibriSpeech and the DEMAND dataset, the experiments show that the proposed models lead to improved quality and intelligibility with fewer trainable parameters, notably reduced model complexity, and inference time than existing recurrent and convolutional models. The quality and intelligibility are improved by 31.61% and 17.18% over the noisy speech. We further performed cross corpus analysis to demonstrate the generalization of the proposed E2E SE models across different speech datasets.
first_indexed	2024-03-09T19:30:49Z
format	Article
id	doaj.art-8a8e1b1f70734b9cabbac9e236f956e3
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-09T19:30:49Z
publishDate	2022-10-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-8a8e1b1f70734b9cabbac9e236f956e32023-11-24T02:25:47ZengMDPI AGSensors1424-82202022-10-012220778210.3390/s22207782End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech EnhancementRizwan Ullah0Lunchakorn Wuttisittikulkij1Sushank Chaudhary2Amir Parnianifard3Shashi Shah4Muhammad Ibrar5Fazal-E Wahab6Wireless Communication Ecosystem Research Unit, Department of Electrical Engineering, Chulalongkorn University, Bangkok 10330, ThailandWireless Communication Ecosystem Research Unit, Department of Electrical Engineering, Chulalongkorn University, Bangkok 10330, ThailandWireless Communication Ecosystem Research Unit, Department of Electrical Engineering, Chulalongkorn University, Bangkok 10330, ThailandWireless Communication Ecosystem Research Unit, Department of Electrical Engineering, Chulalongkorn University, Bangkok 10330, ThailandWireless Communication Ecosystem Research Unit, Department of Electrical Engineering, Chulalongkorn University, Bangkok 10330, ThailandDepartment of Physics, Islamia College Peshawar, Peshawar 25000, PakistanNational Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei 230026, ChinaBecause of their simple design structure, end-to-end deep learning (E2E-DL) models have gained a lot of attention for speech enhancement. A number of DL models have achieved excellent results in eliminating the background noise and enhancing the quality as well as the intelligibility of noisy speech. Designing resource-efficient and compact models during real-time processing is still a key challenge. In order to enhance the accomplishment of E2E models, the sequential and local characteristics of speech signal should be efficiently taken into consideration while modeling. In this paper, we present resource-efficient and compact neural models for end-to-end noise-robust waveform-based speech enhancement. Combining the Convolutional Encode-Decoder (CED) and Recurrent Neural Networks (RNNs) in the Convolutional Recurrent Network (CRN) framework, we have aimed at different speech enhancement systems. Different noise types and speakers are used to train and test the proposed models. With LibriSpeech and the DEMAND dataset, the experiments show that the proposed models lead to improved quality and intelligibility with fewer trainable parameters, notably reduced model complexity, and inference time than existing recurrent and convolutional models. The quality and intelligibility are improved by 31.61% and 17.18% over the noisy speech. We further performed cross corpus analysis to demonstrate the generalization of the proposed E2E SE models across different speech datasets.https://www.mdpi.com/1424-8220/22/20/7782E2E speech processingConvolutional Encode-DecoderConvolutional Recurrent Networkspeech qualityintelligibility
spellingShingle	Rizwan Ullah Lunchakorn Wuttisittikulkij Sushank Chaudhary Amir Parnianifard Shashi Shah Muhammad Ibrar Fazal-E Wahab End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement Sensors E2E speech processing Convolutional Encode-Decoder Convolutional Recurrent Network speech quality intelligibility
title	End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement
title_full	End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement
title_fullStr	End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement
title_full_unstemmed	End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement
title_short	End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement
title_sort	end to end deep convolutional recurrent models for noise robust waveform speech enhancement
topic	E2E speech processing Convolutional Encode-Decoder Convolutional Recurrent Network speech quality intelligibility
url	https://www.mdpi.com/1424-8220/22/20/7782
work_keys_str_mv	AT rizwanullah endtoenddeepconvolutionalrecurrentmodelsfornoiserobustwaveformspeechenhancement AT lunchakornwuttisittikulkij endtoenddeepconvolutionalrecurrentmodelsfornoiserobustwaveformspeechenhancement AT sushankchaudhary endtoenddeepconvolutionalrecurrentmodelsfornoiserobustwaveformspeechenhancement AT amirparnianifard endtoenddeepconvolutionalrecurrentmodelsfornoiserobustwaveformspeechenhancement AT shashishah endtoenddeepconvolutionalrecurrentmodelsfornoiserobustwaveformspeechenhancement AT muhammadibrar endtoenddeepconvolutionalrecurrentmodelsfornoiserobustwaveformspeechenhancement AT fazalewahab endtoenddeepconvolutionalrecurrentmodelsfornoiserobustwaveformspeechenhancement

End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement

Similar Items