Machine Learning Approaches for Whisper to Normal Speech Conversion

Whispered speech is a mode of speech that differs from normal speech due to the absence of a periodic component, namely the Fundamental Frequency that characterizes the pitch, among other spectral and temporal differences. Much attention has been given in recent years to the application of Machine L...

Full description

Bibliographic Details
Main Author: Marco A. Oliveira
Format: Article
Language:English
Published: Universidade do Porto 2022-04-01
Series:U.Porto Journal of Engineering
Subjects:
Online Access:https://journalengineering.fe.up.pt/index.php/upjeng/article/view/1298
_version_ 1818011610082443264
author Marco A. Oliveira
author_facet Marco A. Oliveira
author_sort Marco A. Oliveira
collection DOAJ
description Whispered speech is a mode of speech that differs from normal speech due to the absence of a periodic component, namely the Fundamental Frequency that characterizes the pitch, among other spectral and temporal differences. Much attention has been given in recent years to the application of Machine Learning techniques for voice conversion tasks. The whisper-to-normal speech conversion is particularly challenging, however, especially with respect to the Fundamental Frequency estimation. Based on the most recent literature, this survey assesses the state-of-the-art regarding Machine Learning based whisper-to-normal speech conversion, identifying trends both on modeling and training approaches. The proposed solutions include Generative Adversarial Network based, Autoencoder based and Bidirectional Long Short-Term Memory based frameworks, among other Deep Neural Network based architectures. In addition to Parallel versus Non-Parallel training, time-alignment requirements and strategies, datasets, vocoder usage, as well as both objective and subjective evaluation metrics are also covered by the present survey.
first_indexed 2024-04-14T06:09:59Z
format Article
id doaj.art-d34e4004a2cd4232905b13a2687ee509
institution Directory Open Access Journal
issn 2183-6493
language English
last_indexed 2024-04-14T06:09:59Z
publishDate 2022-04-01
publisher Universidade do Porto
record_format Article
series U.Porto Journal of Engineering
spelling doaj.art-d34e4004a2cd4232905b13a2687ee5092022-12-22T02:08:23ZengUniversidade do PortoU.Porto Journal of Engineering2183-64932022-04-018220221210.24840/2183-6493_008.002_00161469Machine Learning Approaches for Whisper to Normal Speech ConversionMarco A. Oliveira0https://orcid.org/0000-0002-3161-1109Faculty of Engineering, University of PortoWhispered speech is a mode of speech that differs from normal speech due to the absence of a periodic component, namely the Fundamental Frequency that characterizes the pitch, among other spectral and temporal differences. Much attention has been given in recent years to the application of Machine Learning techniques for voice conversion tasks. The whisper-to-normal speech conversion is particularly challenging, however, especially with respect to the Fundamental Frequency estimation. Based on the most recent literature, this survey assesses the state-of-the-art regarding Machine Learning based whisper-to-normal speech conversion, identifying trends both on modeling and training approaches. The proposed solutions include Generative Adversarial Network based, Autoencoder based and Bidirectional Long Short-Term Memory based frameworks, among other Deep Neural Network based architectures. In addition to Parallel versus Non-Parallel training, time-alignment requirements and strategies, datasets, vocoder usage, as well as both objective and subjective evaluation metrics are also covered by the present survey.https://journalengineering.fe.up.pt/index.php/upjeng/article/view/1298signal processingmachine learningwhispered speechnormal speechvoice conversionspeech conversion
spellingShingle Marco A. Oliveira
Machine Learning Approaches for Whisper to Normal Speech Conversion
U.Porto Journal of Engineering
signal processing
machine learning
whispered speech
normal speech
voice conversion
speech conversion
title Machine Learning Approaches for Whisper to Normal Speech Conversion
title_full Machine Learning Approaches for Whisper to Normal Speech Conversion
title_fullStr Machine Learning Approaches for Whisper to Normal Speech Conversion
title_full_unstemmed Machine Learning Approaches for Whisper to Normal Speech Conversion
title_short Machine Learning Approaches for Whisper to Normal Speech Conversion
title_sort machine learning approaches for whisper to normal speech conversion
topic signal processing
machine learning
whispered speech
normal speech
voice conversion
speech conversion
url https://journalengineering.fe.up.pt/index.php/upjeng/article/view/1298
work_keys_str_mv AT marcoaoliveira machinelearningapproachesforwhispertonormalspeechconversion