SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network

Traditional unsupervised speech enhancement models often have problems such as non-aggregation of input feature information, which will introduce additional noise during training, thereby reducing the quality of the speech signal. In order to solve the above problems, this paper analyzed the impact...

Full description

Bibliographic Details
Main Authors:	Rongchuang Lv, Niansheng Chen, Songlin Cheng, Guangyu Fan, Lei Rao, Xiaoyong Song, Wenjing Lv, Dingyu Yang
Format:	Article
Language:	English
Published:	AIMS Press 2024-02-01
Series:	Mathematical Biosciences and Engineering
Subjects:	speech enhancement deep learning generative adversarial network autoencoder
Online Access:	https://www.aimspress.com/article/doi/10.3934/mbe.2024172?viewType=HTML

_version_	1827319270488408064
author	Rongchuang Lv Niansheng Chen Songlin Cheng Guangyu Fan Lei Rao Xiaoyong Song Wenjing Lv Dingyu Yang
author_facet	Rongchuang Lv Niansheng Chen Songlin Cheng Guangyu Fan Lei Rao Xiaoyong Song Wenjing Lv Dingyu Yang
author_sort	Rongchuang Lv
collection	DOAJ
description	Traditional unsupervised speech enhancement models often have problems such as non-aggregation of input feature information, which will introduce additional noise during training, thereby reducing the quality of the speech signal. In order to solve the above problems, this paper analyzed the impact of problems such as non-aggregation of input speech feature information on its performance. Moreover, this article introduced a temporal convolutional neural network and proposed a SASEGAN-TCN speech enhancement model, which captured local features information and aggregated global feature information to improve model effect and training stability. The simulation experiment results showed that the model can achieve 2.1636 and 92.78% in perceptual evaluation of speech quality (PESQ) score and short-time objective intelligibility (STOI) on the Valentini dataset, and can accordingly reach 1.8077 and 83.54% on the THCHS30 dataset. In addition, this article used the enhanced speech data for the acoustic model to verify the recognition accuracy. The speech recognition error rate was reduced by 17.4%, which was a significant improvement compared to the baseline model experimental results.
first_indexed	2024-04-25T00:18:00Z
format	Article
id	doaj.art-a9dbf8ad8fca49fd8c73fc272df14e13
institution	Directory Open Access Journal
issn	1551-0018
language	English
last_indexed	2024-04-25T00:18:00Z
publishDate	2024-02-01
publisher	AIMS Press
record_format	Article
series	Mathematical Biosciences and Engineering
spelling	doaj.art-a9dbf8ad8fca49fd8c73fc272df14e132024-03-13T01:14:29ZengAIMS PressMathematical Biosciences and Engineering1551-00182024-02-012133860387510.3934/mbe.2024172SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional networkRongchuang Lv0Niansheng Chen1Songlin Cheng2Guangyu Fan 3Lei Rao4Xiaoyong Song5Wenjing Lv 6Dingyu Yang 71. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China1. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China1. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China1. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China1. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China1. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China2. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China3. Alibaba Group, Shanghai 201203, ChinaTraditional unsupervised speech enhancement models often have problems such as non-aggregation of input feature information, which will introduce additional noise during training, thereby reducing the quality of the speech signal. In order to solve the above problems, this paper analyzed the impact of problems such as non-aggregation of input speech feature information on its performance. Moreover, this article introduced a temporal convolutional neural network and proposed a SASEGAN-TCN speech enhancement model, which captured local features information and aggregated global feature information to improve model effect and training stability. The simulation experiment results showed that the model can achieve 2.1636 and 92.78% in perceptual evaluation of speech quality (PESQ) score and short-time objective intelligibility (STOI) on the Valentini dataset, and can accordingly reach 1.8077 and 83.54% on the THCHS30 dataset. In addition, this article used the enhanced speech data for the acoustic model to verify the recognition accuracy. The speech recognition error rate was reduced by 17.4%, which was a significant improvement compared to the baseline model experimental results.https://www.aimspress.com/article/doi/10.3934/mbe.2024172?viewType=HTMLspeech enhancementdeep learninggenerative adversarial networkautoencoder
spellingShingle	Rongchuang Lv Niansheng Chen Songlin Cheng Guangyu Fan Lei Rao Xiaoyong Song Wenjing Lv Dingyu Yang SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network Mathematical Biosciences and Engineering speech enhancement deep learning generative adversarial network autoencoder
title	SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network
title_full	SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network
title_fullStr	SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network
title_full_unstemmed	SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network
title_short	SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network
title_sort	sasegan tcn speech enhancement algorithm based on self attention generative adversarial network and temporal convolutional network
topic	speech enhancement deep learning generative adversarial network autoencoder
url	https://www.aimspress.com/article/doi/10.3934/mbe.2024172?viewType=HTML
work_keys_str_mv	AT rongchuanglv sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork AT nianshengchen sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork AT songlincheng sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork AT guangyufan sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork AT leirao sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork AT xiaoyongsong sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork AT wenjinglv sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork AT dingyuyang sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork

SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network

Similar Items