Speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone
The development of speech-enabled mobile applications has greatly improved human-computer interaction in recent years. These applications are flexible and convenient for users. Since the speech signal is captured in mobile conditions, it may easily be contaminated by background noises, which may...
Main Author: | |
---|---|
Format: | Thesis |
Language: | English English English |
Published: |
2022
|
Subjects: | |
Online Access: | http://eprints.uthm.edu.my/8464/1/24p%20NOREZMI%20MD%20JAMAL.pdf http://eprints.uthm.edu.my/8464/2/NOREZMI%20MD%20JAMAL%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/8464/3/NOREZMI%20MD%20JAMAL%20WATERMARK.pdf |
_version_ | 1825710578331549696 |
---|---|
author | Md Jamal, Norezmi |
author_facet | Md Jamal, Norezmi |
author_sort | Md Jamal, Norezmi |
collection | UTHM |
description | The development of speech-enabled mobile applications has greatly improved
human-computer interaction in recent years. These applications are flexible and
convenient for users. Since the speech signal is captured in mobile conditions, it may
easily be contaminated by background noises, which may result in a complicated
computation and require speech enhancement algorithm. Thus, the performance of
speech applications can be degraded when signal-to-noise ratio (SNR) is low and nonstationary
noise is present. Moreover, the task of removing noises without causing
speech distortion is also challenging, in which the quality and intelligibility of speech
are affected. In order to overcome these issues, a supervised Deep Neural Network
(DNN) algorithm predicted constrained Wiener Filter (cWF) target mask algorithm
based on extracted Gammatone filter bank power spectrum (GF-TF) features and
trained model is developed. As a result, the trained model with GF-TF features and
cross-speech dataset produced promising results, while the proposed target mask
scored higher on the perceptual evaluation of speech quality (PESQ) and short-time
objective intelligibility (STOI) tests. On top of that, a modified Harmonic
Regeneration Noise Reduction (HRNR) algorithm is proposed as a post-filtering
strategy to enhance speech signal due to residual noise being introduced after DNN
prediction. Results from TIMIT dataset revealed that average STOI scores for the joint
algorithm are higher than those of DNN, conventional HRNR and Log Minimum
Mean Square Error (Log-MMSE) algorithms. With SNR of -5 dB, an improvement of
4% over DNN algorithm, 36% over conventional HRNR algorithm, and 12% over
Log-MMSE algorithm are obtained. While the average PESQ score is less affected
after post-filtering strategy. Thus, this work has contributed to improve speech
intelligibility from noisy backgrounds at low SNR as it can be deployed in speechenabled
mobile applications. |
first_indexed | 2024-03-05T21:59:43Z |
format | Thesis |
id | uthm.eprints-8464 |
institution | Universiti Tun Hussein Onn Malaysia |
language | English English English |
last_indexed | 2024-03-05T21:59:43Z |
publishDate | 2022 |
record_format | dspace |
spelling | uthm.eprints-84642023-02-27T01:01:10Z http://eprints.uthm.edu.my/8464/ Speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone Md Jamal, Norezmi TK5101-6720 Telecommunication. Including telegraphy, telephone, radio, radar, television The development of speech-enabled mobile applications has greatly improved human-computer interaction in recent years. These applications are flexible and convenient for users. Since the speech signal is captured in mobile conditions, it may easily be contaminated by background noises, which may result in a complicated computation and require speech enhancement algorithm. Thus, the performance of speech applications can be degraded when signal-to-noise ratio (SNR) is low and nonstationary noise is present. Moreover, the task of removing noises without causing speech distortion is also challenging, in which the quality and intelligibility of speech are affected. In order to overcome these issues, a supervised Deep Neural Network (DNN) algorithm predicted constrained Wiener Filter (cWF) target mask algorithm based on extracted Gammatone filter bank power spectrum (GF-TF) features and trained model is developed. As a result, the trained model with GF-TF features and cross-speech dataset produced promising results, while the proposed target mask scored higher on the perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) tests. On top of that, a modified Harmonic Regeneration Noise Reduction (HRNR) algorithm is proposed as a post-filtering strategy to enhance speech signal due to residual noise being introduced after DNN prediction. Results from TIMIT dataset revealed that average STOI scores for the joint algorithm are higher than those of DNN, conventional HRNR and Log Minimum Mean Square Error (Log-MMSE) algorithms. With SNR of -5 dB, an improvement of 4% over DNN algorithm, 36% over conventional HRNR algorithm, and 12% over Log-MMSE algorithm are obtained. While the average PESQ score is less affected after post-filtering strategy. Thus, this work has contributed to improve speech intelligibility from noisy backgrounds at low SNR as it can be deployed in speechenabled mobile applications. 2022-01 Thesis NonPeerReviewed text en http://eprints.uthm.edu.my/8464/1/24p%20NOREZMI%20MD%20JAMAL.pdf text en http://eprints.uthm.edu.my/8464/2/NOREZMI%20MD%20JAMAL%20COPYRIGHT%20DECLARATION.pdf text en http://eprints.uthm.edu.my/8464/3/NOREZMI%20MD%20JAMAL%20WATERMARK.pdf Md Jamal, Norezmi (2022) Speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone. Doctoral thesis, Universiti Tun Hussein Onn Malaysia. |
spellingShingle | TK5101-6720 Telecommunication. Including telegraphy, telephone, radio, radar, television Md Jamal, Norezmi Speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone |
title | Speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone |
title_full | Speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone |
title_fullStr | Speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone |
title_full_unstemmed | Speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone |
title_short | Speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone |
title_sort | speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone |
topic | TK5101-6720 Telecommunication. Including telegraphy, telephone, radio, radar, television |
url | http://eprints.uthm.edu.my/8464/1/24p%20NOREZMI%20MD%20JAMAL.pdf http://eprints.uthm.edu.my/8464/2/NOREZMI%20MD%20JAMAL%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/8464/3/NOREZMI%20MD%20JAMAL%20WATERMARK.pdf |
work_keys_str_mv | AT mdjamalnorezmi speechenhancementusingdeepneuralnetworkbasedonmaskestimationandharmonicregenerationnoisereductionforsinglechannelmicrophone |