Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions

Generative adversarial networks (GANs) have recently garnered significant attention for their use in speech enhancement tasks, in which they generally process and reconstruct speech waveforms directly. Existing GANs for speech enhancement rely solely on the convolution operation, which may not accur...

Full description

Bibliographic Details
Main Authors:	Lujun Li, Wudamu, Ludwig Kürzinger, Tobias Watzel, Gerhard Rigoll
Format:	Article
Language:	English
Published:	MDPI AG 2021-08-01
Series:	Applied Sciences
Subjects:	speech enhancement generative adversarial networks Sinc convolution data augmentation raw samples
Online Access:	https://www.mdpi.com/2076-3417/11/16/7564

_version_	1827685944695717888
author	Lujun Li Wudamu Ludwig Kürzinger Tobias Watzel Gerhard Rigoll
author_facet	Lujun Li Wudamu Ludwig Kürzinger Tobias Watzel Gerhard Rigoll
author_sort	Lujun Li
collection	DOAJ
description	Generative adversarial networks (GANs) have recently garnered significant attention for their use in speech enhancement tasks, in which they generally process and reconstruct speech waveforms directly. Existing GANs for speech enhancement rely solely on the convolution operation, which may not accurately characterize the local information of speech signals—particularly high-frequency components. Sinc convolution has been proposed in order to allow the GAN to learn more meaningful filters in the input layer, and has achieved remarkable success in several speech signal processing tasks. Nevertheless, Sinc convolution for speech enhancement is still an under-explored research direction. This paper proposes Sinc–SEGAN, a novel generative adversarial architecture for speech enhancement, which usefully merges two powerful paradigms: Sinc convolution and the speech enhancement GAN (SEGAN). There are two highlights of the proposed system. First, it works in an end-to-end manner, overcoming the distortion caused by imperfect phase estimation. Second, the system derives a customized filter bank, tuned for the desired application compactly and efficiently. We empirically study the influence of different configurations of Sinc convolution, including the placement of the Sinc convolution layer, length of input signals, number of Sinc filters, and kernel size of Sinc convolution. Moreover, we employ a set of data augmentation techniques in the time domain, which further improve the system performance and its generalization abilities. Compared to competitive baseline systems, Sinc–SEGAN overtakes all of them with drastically reduced system parameters, demonstrating its effectiveness for practical usage, e.g., hearing aid design and cochlear implants. Additionally, data augmentation methods further boost Sinc–SEGAN performance across classic objective evaluation criteria for speech enhancement.
first_indexed	2024-03-10T09:01:34Z
format	Article
id	doaj.art-9b33d41e58914b039dd52b878a5e84b0
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-10T09:01:34Z
publishDate	2021-08-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-9b33d41e58914b039dd52b878a5e84b02023-11-22T06:43:32ZengMDPI AGApplied Sciences2076-34172021-08-011116756410.3390/app11167564Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc ConvolutionsLujun Li0Wudamu1Ludwig Kürzinger2Tobias Watzel3Gerhard Rigoll4Department of Electrical and Computer Engineering, Technical University of Munich, 80333 Munich, GermanyDepartment of Electrical and Computer Engineering, Technical University of Munich, 80333 Munich, GermanyDepartment of Electrical and Computer Engineering, Technical University of Munich, 80333 Munich, GermanyDepartment of Electrical and Computer Engineering, Technical University of Munich, 80333 Munich, GermanyDepartment of Electrical and Computer Engineering, Technical University of Munich, 80333 Munich, GermanyGenerative adversarial networks (GANs) have recently garnered significant attention for their use in speech enhancement tasks, in which they generally process and reconstruct speech waveforms directly. Existing GANs for speech enhancement rely solely on the convolution operation, which may not accurately characterize the local information of speech signals—particularly high-frequency components. Sinc convolution has been proposed in order to allow the GAN to learn more meaningful filters in the input layer, and has achieved remarkable success in several speech signal processing tasks. Nevertheless, Sinc convolution for speech enhancement is still an under-explored research direction. This paper proposes Sinc–SEGAN, a novel generative adversarial architecture for speech enhancement, which usefully merges two powerful paradigms: Sinc convolution and the speech enhancement GAN (SEGAN). There are two highlights of the proposed system. First, it works in an end-to-end manner, overcoming the distortion caused by imperfect phase estimation. Second, the system derives a customized filter bank, tuned for the desired application compactly and efficiently. We empirically study the influence of different configurations of Sinc convolution, including the placement of the Sinc convolution layer, length of input signals, number of Sinc filters, and kernel size of Sinc convolution. Moreover, we employ a set of data augmentation techniques in the time domain, which further improve the system performance and its generalization abilities. Compared to competitive baseline systems, Sinc–SEGAN overtakes all of them with drastically reduced system parameters, demonstrating its effectiveness for practical usage, e.g., hearing aid design and cochlear implants. Additionally, data augmentation methods further boost Sinc–SEGAN performance across classic objective evaluation criteria for speech enhancement.https://www.mdpi.com/2076-3417/11/16/7564speech enhancementgenerative adversarial networksSinc convolutiondata augmentationraw samples
spellingShingle	Lujun Li Wudamu Ludwig Kürzinger Tobias Watzel Gerhard Rigoll Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions Applied Sciences speech enhancement generative adversarial networks Sinc convolution data augmentation raw samples
title	Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions
title_full	Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions
title_fullStr	Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions
title_full_unstemmed	Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions
title_short	Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions
title_sort	lightweight end to end speech enhancement generative adversarial network using sinc convolutions
topic	speech enhancement generative adversarial networks Sinc convolution data augmentation raw samples
url	https://www.mdpi.com/2076-3417/11/16/7564
work_keys_str_mv	AT lujunli lightweightendtoendspeechenhancementgenerativeadversarialnetworkusingsincconvolutions AT wudamu lightweightendtoendspeechenhancementgenerativeadversarialnetworkusingsincconvolutions AT ludwigkurzinger lightweightendtoendspeechenhancementgenerativeadversarialnetworkusingsincconvolutions AT tobiaswatzel lightweightendtoendspeechenhancementgenerativeadversarialnetworkusingsincconvolutions AT gerhardrigoll lightweightendtoendspeechenhancementgenerativeadversarialnetworkusingsincconvolutions

Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions

Similar Items