SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network
Traditional unsupervised speech enhancement models often have problems such as non-aggregation of input feature information, which will introduce additional noise during training, thereby reducing the quality of the speech signal. In order to solve the above problems, this paper analyzed the impact...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
AIMS Press
2024-02-01
|
Series: | Mathematical Biosciences and Engineering |
Subjects: | |
Online Access: | https://www.aimspress.com/article/doi/10.3934/mbe.2024172?viewType=HTML |
_version_ | 1827319270488408064 |
---|---|
author | Rongchuang Lv Niansheng Chen Songlin Cheng Guangyu Fan Lei Rao Xiaoyong Song Wenjing Lv Dingyu Yang |
author_facet | Rongchuang Lv Niansheng Chen Songlin Cheng Guangyu Fan Lei Rao Xiaoyong Song Wenjing Lv Dingyu Yang |
author_sort | Rongchuang Lv |
collection | DOAJ |
description | Traditional unsupervised speech enhancement models often have problems such as non-aggregation of input feature information, which will introduce additional noise during training, thereby reducing the quality of the speech signal. In order to solve the above problems, this paper analyzed the impact of problems such as non-aggregation of input speech feature information on its performance. Moreover, this article introduced a temporal convolutional neural network and proposed a SASEGAN-TCN speech enhancement model, which captured local features information and aggregated global feature information to improve model effect and training stability. The simulation experiment results showed that the model can achieve 2.1636 and 92.78% in perceptual evaluation of speech quality (PESQ) score and short-time objective intelligibility (STOI) on the Valentini dataset, and can accordingly reach 1.8077 and 83.54% on the THCHS30 dataset. In addition, this article used the enhanced speech data for the acoustic model to verify the recognition accuracy. The speech recognition error rate was reduced by 17.4%, which was a significant improvement compared to the baseline model experimental results. |
first_indexed | 2024-04-25T00:18:00Z |
format | Article |
id | doaj.art-a9dbf8ad8fca49fd8c73fc272df14e13 |
institution | Directory Open Access Journal |
issn | 1551-0018 |
language | English |
last_indexed | 2024-04-25T00:18:00Z |
publishDate | 2024-02-01 |
publisher | AIMS Press |
record_format | Article |
series | Mathematical Biosciences and Engineering |
spelling | doaj.art-a9dbf8ad8fca49fd8c73fc272df14e132024-03-13T01:14:29ZengAIMS PressMathematical Biosciences and Engineering1551-00182024-02-012133860387510.3934/mbe.2024172SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional networkRongchuang Lv0Niansheng Chen1Songlin Cheng2Guangyu Fan 3Lei Rao4Xiaoyong Song5Wenjing Lv 6Dingyu Yang 71. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China1. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China1. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China1. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China1. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China1. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China2. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China3. Alibaba Group, Shanghai 201203, ChinaTraditional unsupervised speech enhancement models often have problems such as non-aggregation of input feature information, which will introduce additional noise during training, thereby reducing the quality of the speech signal. In order to solve the above problems, this paper analyzed the impact of problems such as non-aggregation of input speech feature information on its performance. Moreover, this article introduced a temporal convolutional neural network and proposed a SASEGAN-TCN speech enhancement model, which captured local features information and aggregated global feature information to improve model effect and training stability. The simulation experiment results showed that the model can achieve 2.1636 and 92.78% in perceptual evaluation of speech quality (PESQ) score and short-time objective intelligibility (STOI) on the Valentini dataset, and can accordingly reach 1.8077 and 83.54% on the THCHS30 dataset. In addition, this article used the enhanced speech data for the acoustic model to verify the recognition accuracy. The speech recognition error rate was reduced by 17.4%, which was a significant improvement compared to the baseline model experimental results.https://www.aimspress.com/article/doi/10.3934/mbe.2024172?viewType=HTMLspeech enhancementdeep learninggenerative adversarial networkautoencoder |
spellingShingle | Rongchuang Lv Niansheng Chen Songlin Cheng Guangyu Fan Lei Rao Xiaoyong Song Wenjing Lv Dingyu Yang SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network Mathematical Biosciences and Engineering speech enhancement deep learning generative adversarial network autoencoder |
title | SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network |
title_full | SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network |
title_fullStr | SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network |
title_full_unstemmed | SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network |
title_short | SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network |
title_sort | sasegan tcn speech enhancement algorithm based on self attention generative adversarial network and temporal convolutional network |
topic | speech enhancement deep learning generative adversarial network autoencoder |
url | https://www.aimspress.com/article/doi/10.3934/mbe.2024172?viewType=HTML |
work_keys_str_mv | AT rongchuanglv sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork AT nianshengchen sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork AT songlincheng sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork AT guangyufan sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork AT leirao sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork AT xiaoyongsong sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork AT wenjinglv sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork AT dingyuyang sasegantcnspeechenhancementalgorithmbasedonselfattentiongenerativeadversarialnetworkandtemporalconvolutionalnetwork |