Speech Enhancement Algorithm Based on a Convolutional Neural Network Reconstruction of the Temporal Envelope of Speech in Noisy Environments

Temporal modulation processing is a promising technique for improving the intelligibility and quality of speech in noise. We propose a speech enhancement algorithm that constructs the temporal envelope (TEV) in the time-frequency domain by means of an embedded convolutional neural network (CNN). To...

Full description

Bibliographic Details
Main Authors:	Rahim Soleymanpour, Mohammad Soleymanpour, Anthony J. Brammer, Michael T. Johnson, Insoo Kim
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Speech enhancement temporal envelope (TEV) convolution neural network (CNN)
Online Access:	https://ieeexplore.ieee.org/document/10014997/

_version_	1797945066494361600
author	Rahim Soleymanpour Mohammad Soleymanpour Anthony J. Brammer Michael T. Johnson Insoo Kim
author_facet	Rahim Soleymanpour Mohammad Soleymanpour Anthony J. Brammer Michael T. Johnson Insoo Kim
author_sort	Rahim Soleymanpour
collection	DOAJ
description	Temporal modulation processing is a promising technique for improving the intelligibility and quality of speech in noise. We propose a speech enhancement algorithm that constructs the temporal envelope (TEV) in the time-frequency domain by means of an embedded convolutional neural network (CNN). To accomplish this, the input speech signals are divided into sixteen parallel frequency bands (subbands) with bandwidths approximating 1.5 times that of auditory filters. The corrupted TEVs in each subband are extracted and then fed to the 1-dimensional CNN (1-D CNN) model to restore the TEVs distorted by noise. The method is evaluated using 2,700 words from nine different talkers, which are mixed with speech-spectrum shaped random noise (SSN), and babble noise, at different signal-to-noise ratios. The Short-time Objective Intelligibility (STOI) and Perceptual Evaluation of Speech Quality (PESQ) metrics are used to evaluate the performance of the 1-D CNN algorithm. Results suggest that the 1-D CNN model improves STOI scores on average by 27% and 34% for SSN and babble noise, respectively, and PESQ scores on average by 19% and 18%, respectively, compared to unprocessed speech. The 1-D CNN model is also shown to outperform a conventional TEV-based speech enhancement algorithm.
first_indexed	2024-04-10T20:49:24Z
format	Article
id	doaj.art-b374a666d2dd4a349542ee65c68ad018
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-10T20:49:24Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-b374a666d2dd4a349542ee65c68ad0182023-01-24T00:00:42ZengIEEEIEEE Access2169-35362023-01-01115328533610.1109/ACCESS.2023.323624210014997Speech Enhancement Algorithm Based on a Convolutional Neural Network Reconstruction of the Temporal Envelope of Speech in Noisy EnvironmentsRahim Soleymanpour0https://orcid.org/0000-0001-7848-4138Mohammad Soleymanpour1Anthony J. Brammer2Michael T. Johnson3https://orcid.org/0000-0001-5424-4877Insoo Kim4https://orcid.org/0000-0001-6539-1776Department of Medicine, University of Connecticut School of Medicine, Farmington, CT, USADepartment of Electrical and Computer Engineering, University of Kentucky, Lexington, KY, USADepartment of Medicine, University of Connecticut School of Medicine, Farmington, CT, USADepartment of Electrical and Computer Engineering, University of Kentucky, Lexington, KY, USADepartment of Medicine, University of Connecticut School of Medicine, Farmington, CT, USATemporal modulation processing is a promising technique for improving the intelligibility and quality of speech in noise. We propose a speech enhancement algorithm that constructs the temporal envelope (TEV) in the time-frequency domain by means of an embedded convolutional neural network (CNN). To accomplish this, the input speech signals are divided into sixteen parallel frequency bands (subbands) with bandwidths approximating 1.5 times that of auditory filters. The corrupted TEVs in each subband are extracted and then fed to the 1-dimensional CNN (1-D CNN) model to restore the TEVs distorted by noise. The method is evaluated using 2,700 words from nine different talkers, which are mixed with speech-spectrum shaped random noise (SSN), and babble noise, at different signal-to-noise ratios. The Short-time Objective Intelligibility (STOI) and Perceptual Evaluation of Speech Quality (PESQ) metrics are used to evaluate the performance of the 1-D CNN algorithm. Results suggest that the 1-D CNN model improves STOI scores on average by 27% and 34% for SSN and babble noise, respectively, and PESQ scores on average by 19% and 18%, respectively, compared to unprocessed speech. The 1-D CNN model is also shown to outperform a conventional TEV-based speech enhancement algorithm.https://ieeexplore.ieee.org/document/10014997/Speech enhancementtemporal envelope (TEV)convolution neural network (CNN)
spellingShingle	Rahim Soleymanpour Mohammad Soleymanpour Anthony J. Brammer Michael T. Johnson Insoo Kim Speech Enhancement Algorithm Based on a Convolutional Neural Network Reconstruction of the Temporal Envelope of Speech in Noisy Environments IEEE Access Speech enhancement temporal envelope (TEV) convolution neural network (CNN)
title	Speech Enhancement Algorithm Based on a Convolutional Neural Network Reconstruction of the Temporal Envelope of Speech in Noisy Environments
title_full	Speech Enhancement Algorithm Based on a Convolutional Neural Network Reconstruction of the Temporal Envelope of Speech in Noisy Environments
title_fullStr	Speech Enhancement Algorithm Based on a Convolutional Neural Network Reconstruction of the Temporal Envelope of Speech in Noisy Environments
title_full_unstemmed	Speech Enhancement Algorithm Based on a Convolutional Neural Network Reconstruction of the Temporal Envelope of Speech in Noisy Environments
title_short	Speech Enhancement Algorithm Based on a Convolutional Neural Network Reconstruction of the Temporal Envelope of Speech in Noisy Environments
title_sort	speech enhancement algorithm based on a convolutional neural network reconstruction of the temporal envelope of speech in noisy environments
topic	Speech enhancement temporal envelope (TEV) convolution neural network (CNN)
url	https://ieeexplore.ieee.org/document/10014997/
work_keys_str_mv	AT rahimsoleymanpour speechenhancementalgorithmbasedonaconvolutionalneuralnetworkreconstructionofthetemporalenvelopeofspeechinnoisyenvironments AT mohammadsoleymanpour speechenhancementalgorithmbasedonaconvolutionalneuralnetworkreconstructionofthetemporalenvelopeofspeechinnoisyenvironments AT anthonyjbrammer speechenhancementalgorithmbasedonaconvolutionalneuralnetworkreconstructionofthetemporalenvelopeofspeechinnoisyenvironments AT michaeltjohnson speechenhancementalgorithmbasedonaconvolutionalneuralnetworkreconstructionofthetemporalenvelopeofspeechinnoisyenvironments AT insookim speechenhancementalgorithmbasedonaconvolutionalneuralnetworkreconstructionofthetemporalenvelopeofspeechinnoisyenvironments

Speech Enhancement Algorithm Based on a Convolutional Neural Network Reconstruction of the Temporal Envelope of Speech in Noisy Environments

Similar Items