Speech enhancement from fused features based on deep neural network and gated recurrent unit network

Abstract Speech is easily interfered by external environment in reality, which results in the loss of important features. Deep learning has become a popular speech enhancement method because of its superior potential in solving nonlinear mapping problems for complex features. However, the deficiency...

Full description

Bibliographic Details
Main Authors:	Youming Wang, Jiali Han, Tianqi Zhang, Didi Qing
Format:	Article
Language:	English
Published:	SpringerOpen 2021-10-01
Series:	EURASIP Journal on Advances in Signal Processing
Subjects:	Speech enhancement Deep neural network Gated recurrent unit Speech quality
Online Access:	https://doi.org/10.1186/s13634-021-00813-8

_version_	1819024971974836224
author	Youming Wang Jiali Han Tianqi Zhang Didi Qing
author_facet	Youming Wang Jiali Han Tianqi Zhang Didi Qing
author_sort	Youming Wang
collection	DOAJ
description	Abstract Speech is easily interfered by external environment in reality, which results in the loss of important features. Deep learning has become a popular speech enhancement method because of its superior potential in solving nonlinear mapping problems for complex features. However, the deficiency of traditional deep learning methods is the weak learning capability of important information from previous time steps and long-term event dependencies between the time-series data. To overcome this problem, we propose a novel speech enhancement method based on the fused features of deep neural networks (DNNs) and gated recurrent unit (GRU). The proposed method uses GRU to reduce the number of parameters of DNNs and acquire the context information of the speech, which improves the enhanced speech quality and intelligibility. Firstly, DNN with multiple hidden layers is used to learn the mapping relationship between the logarithmic power spectrum (LPS) features of noisy speech and clean speech. Secondly, the LPS feature of the deep neural network is fused with the noisy speech as the input of GRU network to compensate the missing context information. Finally, GRU network is performed to learn the mapping relationship between LPS features and log power spectrum features of clean speech spectrum. The proposed model is experimentally compared with traditional speech enhancement models, including DNN, CNN, LSTM and GRU. Experimental results demonstrate that the PESQ, SSNR and STOI of the proposed algorithm are improved by 30.72%, 39.84% and 5.53%, respectively, compared with the noise signal under the condition of matched noise. Under the condition of unmatched noise, the PESQ and STOI of the algorithm are improved by 23.8% and 37.36%, respectively. The advantage of the proposed method is that it uses the key information of features to suppress noise in both matched and unmatched noise cases and the proposed method outperforms other common methods in speech enhancement.
first_indexed	2024-12-21T05:03:16Z
format	Article
id	doaj.art-94c805699af54eca8b8f28dd8bc83a19
institution	Directory Open Access Journal
issn	1687-6180
language	English
last_indexed	2024-12-21T05:03:16Z
publishDate	2021-10-01
publisher	SpringerOpen
record_format	Article
series	EURASIP Journal on Advances in Signal Processing
spelling	doaj.art-94c805699af54eca8b8f28dd8bc83a192022-12-21T19:15:12ZengSpringerOpenEURASIP Journal on Advances in Signal Processing1687-61802021-10-012021111910.1186/s13634-021-00813-8Speech enhancement from fused features based on deep neural network and gated recurrent unit networkYouming Wang0Jiali Han1Tianqi Zhang2Didi Qing3School of Automation, Xi’an University of Posts and TelecommunicationsSchool of Automation, Xi’an University of Posts and TelecommunicationsSchool of Automation, Xi’an University of Posts and TelecommunicationsSchool of Automation, Xi’an University of Posts and TelecommunicationsAbstract Speech is easily interfered by external environment in reality, which results in the loss of important features. Deep learning has become a popular speech enhancement method because of its superior potential in solving nonlinear mapping problems for complex features. However, the deficiency of traditional deep learning methods is the weak learning capability of important information from previous time steps and long-term event dependencies between the time-series data. To overcome this problem, we propose a novel speech enhancement method based on the fused features of deep neural networks (DNNs) and gated recurrent unit (GRU). The proposed method uses GRU to reduce the number of parameters of DNNs and acquire the context information of the speech, which improves the enhanced speech quality and intelligibility. Firstly, DNN with multiple hidden layers is used to learn the mapping relationship between the logarithmic power spectrum (LPS) features of noisy speech and clean speech. Secondly, the LPS feature of the deep neural network is fused with the noisy speech as the input of GRU network to compensate the missing context information. Finally, GRU network is performed to learn the mapping relationship between LPS features and log power spectrum features of clean speech spectrum. The proposed model is experimentally compared with traditional speech enhancement models, including DNN, CNN, LSTM and GRU. Experimental results demonstrate that the PESQ, SSNR and STOI of the proposed algorithm are improved by 30.72%, 39.84% and 5.53%, respectively, compared with the noise signal under the condition of matched noise. Under the condition of unmatched noise, the PESQ and STOI of the algorithm are improved by 23.8% and 37.36%, respectively. The advantage of the proposed method is that it uses the key information of features to suppress noise in both matched and unmatched noise cases and the proposed method outperforms other common methods in speech enhancement.https://doi.org/10.1186/s13634-021-00813-8Speech enhancementDeep neural networkGated recurrent unitSpeech quality
spellingShingle	Youming Wang Jiali Han Tianqi Zhang Didi Qing Speech enhancement from fused features based on deep neural network and gated recurrent unit network EURASIP Journal on Advances in Signal Processing Speech enhancement Deep neural network Gated recurrent unit Speech quality
title	Speech enhancement from fused features based on deep neural network and gated recurrent unit network
title_full	Speech enhancement from fused features based on deep neural network and gated recurrent unit network
title_fullStr	Speech enhancement from fused features based on deep neural network and gated recurrent unit network
title_full_unstemmed	Speech enhancement from fused features based on deep neural network and gated recurrent unit network
title_short	Speech enhancement from fused features based on deep neural network and gated recurrent unit network
title_sort	speech enhancement from fused features based on deep neural network and gated recurrent unit network
topic	Speech enhancement Deep neural network Gated recurrent unit Speech quality
url	https://doi.org/10.1186/s13634-021-00813-8
work_keys_str_mv	AT youmingwang speechenhancementfromfusedfeaturesbasedondeepneuralnetworkandgatedrecurrentunitnetwork AT jialihan speechenhancementfromfusedfeaturesbasedondeepneuralnetworkandgatedrecurrentunitnetwork AT tianqizhang speechenhancementfromfusedfeaturesbasedondeepneuralnetworkandgatedrecurrentunitnetwork AT didiqing speechenhancementfromfusedfeaturesbasedondeepneuralnetworkandgatedrecurrentunitnetwork

Speech enhancement from fused features based on deep neural network and gated recurrent unit network

Similar Items