SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network

Traditional unsupervised speech enhancement models often have problems such as non-aggregation of input feature information, which will introduce additional noise during training, thereby reducing the quality of the speech signal. In order to solve the above problems, this paper analyzed the impact...

Full description

Bibliographic Details
Main Authors: Rongchuang Lv, Niansheng Chen, Songlin Cheng, Guangyu Fan, Lei Rao, Xiaoyong Song, Wenjing Lv, Dingyu Yang
Format: Article
Language:English
Published: AIMS Press 2024-02-01
Series:Mathematical Biosciences and Engineering
Subjects:
Online Access:https://www.aimspress.com/article/doi/10.3934/mbe.2024172?viewType=HTML
_version_ 1797263752593145856
author Rongchuang Lv
Niansheng Chen
Songlin Cheng
Guangyu Fan
Lei Rao
Xiaoyong Song
Wenjing Lv
Dingyu Yang
author_facet Rongchuang Lv
Niansheng Chen
Songlin Cheng
Guangyu Fan
Lei Rao
Xiaoyong Song
Wenjing Lv
Dingyu Yang
author_sort Rongchuang Lv
collection DOAJ
description Traditional unsupervised speech enhancement models often have problems such as non-aggregation of input feature information, which will introduce additional noise during training, thereby reducing the quality of the speech signal. In order to solve the above problems, this paper analyzed the impact of problems such as non-aggregation of input speech feature information on its performance. Moreover, this article introduced a temporal convolutional neural network and proposed a SASEGAN-TCN speech enhancement model, which captured local features information and aggregated global feature information to improve model effect and training stability. The simulation experiment results showed that the model can achieve 2.1636 and 92.78% in perceptual evaluation of speech quality (PESQ) score and short-time objective intelligibility (STOI) on the Valentini dataset, and can accordingly reach 1.8077 and 83.54% on the THCHS30 dataset. In addition, this article used the enhanced speech data for the acoustic model to verify the recognition accuracy. The speech recognition error rate was reduced by 17.4%, which was a significant improvement compared to the baseline model experimental results.
first_indexed 2024-04-25T00:18:00Z
format Article
id doaj.art-a9dbf8ad8fca49fd8c73fc272df14e13
institution Directory Open Access Journal
issn 1551-0018
language English
last_indexed 2024-04-25T00:18:00Z
publishDate 2024-02-01
publisher AIMS Press
record_format Article
series Mathematical Biosciences and Engineering
spelling doaj.art-a9dbf8ad8fca49fd8c73fc272df14e132024-03-13T01:14:29ZengAIMS PressMathematical Biosciences and Engineering1551-00182024-02-012133860387510.3934/mbe.2024172SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional networkRongchuang Lv0Niansheng Chen1Songlin Cheng2Guangyu Fan 3Lei Rao4Xiaoyong Song5Wenjing Lv 6Dingyu Yang 71. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China1. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China1. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China1. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China1. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China1. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China2. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China3. Alibaba Group, Shanghai 201203, ChinaTraditional unsupervised speech enhancement models often have problems such as non-aggregation of input feature information, which will introduce additional noise during training, thereby reducing the quality of the speech signal. In order to solve the above problems, this paper analyzed the impact of problems such as non-aggregation of input speech feature information on its performance. Moreover, this article introduced a temporal convolutional neural network and proposed a SASEGAN-TCN speech enhancement model, which captured local features information and aggregated global feature information to improve model effect and training stability. The simulation experiment results showed that the model can achieve 2.1636 and 92.78% in perceptual evaluation of speech quality (PESQ) score and short-time objective intelligibility (STOI) on the Valentini dataset, and can accordingly reach 1.8077 and 83.54% on the THCHS30 dataset. In addition, this article used the enhanced speech data for the acoustic model to verify the recognition accuracy. The speech recognition error rate was reduced by 17.4%, which was a significant improvement compared to the baseline model experimental results.https://www.aimspress.com/article/doi/10.3934/mbe.2024172?viewType=HTMLspeech enhancementdeep learninggenerative adversarial networkautoencoder
spellingShingle Rongchuang Lv
Niansheng Chen
Songlin Cheng
Guangyu Fan
Lei Rao
Xiaoyong Song
Wenjing Lv
Dingyu Yang
SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network
Mathematical Biosciences and Engineering
speech enhancement
deep learning
generative adversarial network
autoencoder
title SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network
title_full SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network
title_fullStr SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network
title_full_unstemmed SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network
title_short SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network
title_sort sasegan tcn speech enhancement algorithm based on self attention generative adversarial network and temporal convolutional network
topic speech enhancement
deep learning
generative adversarial network
autoencoder
url https://www.aimspress.com/article/doi/10.3934/mbe.2024172?viewType=HTML
work_keys_str_mv AT rongchuanglv sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork
AT nianshengchen sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork
AT songlincheng sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork
AT guangyufan sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork
AT leirao sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork
AT xiaoyongsong sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork
AT wenjinglv sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork
AT dingyuyang sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork