Machine Learning Approaches for Whisper to Normal Speech Conversion
Whispered speech is a mode of speech that differs from normal speech due to the absence of a periodic component, namely the Fundamental Frequency that characterizes the pitch, among other spectral and temporal differences. Much attention has been given in recent years to the application of Machine L...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Universidade do Porto
2022-04-01
|
Series: | U.Porto Journal of Engineering |
Subjects: | |
Online Access: | https://journalengineering.fe.up.pt/index.php/upjeng/article/view/1298 |
_version_ | 1818011610082443264 |
---|---|
author | Marco A. Oliveira |
author_facet | Marco A. Oliveira |
author_sort | Marco A. Oliveira |
collection | DOAJ |
description | Whispered speech is a mode of speech that differs from normal speech due to the absence of a periodic component, namely the Fundamental Frequency that characterizes the pitch, among other spectral and temporal differences. Much attention has been given in recent years to the application of Machine Learning techniques for voice conversion tasks. The whisper-to-normal speech conversion is particularly challenging, however, especially with respect to the Fundamental Frequency estimation. Based on the most recent literature, this survey assesses the state-of-the-art regarding Machine Learning based whisper-to-normal speech conversion, identifying trends both on modeling and training approaches. The proposed solutions include Generative Adversarial Network based, Autoencoder based and Bidirectional Long Short-Term Memory based frameworks, among other Deep Neural Network based architectures. In addition to Parallel versus Non-Parallel training, time-alignment requirements and strategies, datasets, vocoder usage, as well as both objective and subjective evaluation metrics are also covered by the present survey. |
first_indexed | 2024-04-14T06:09:59Z |
format | Article |
id | doaj.art-d34e4004a2cd4232905b13a2687ee509 |
institution | Directory Open Access Journal |
issn | 2183-6493 |
language | English |
last_indexed | 2024-04-14T06:09:59Z |
publishDate | 2022-04-01 |
publisher | Universidade do Porto |
record_format | Article |
series | U.Porto Journal of Engineering |
spelling | doaj.art-d34e4004a2cd4232905b13a2687ee5092022-12-22T02:08:23ZengUniversidade do PortoU.Porto Journal of Engineering2183-64932022-04-018220221210.24840/2183-6493_008.002_00161469Machine Learning Approaches for Whisper to Normal Speech ConversionMarco A. Oliveira0https://orcid.org/0000-0002-3161-1109Faculty of Engineering, University of PortoWhispered speech is a mode of speech that differs from normal speech due to the absence of a periodic component, namely the Fundamental Frequency that characterizes the pitch, among other spectral and temporal differences. Much attention has been given in recent years to the application of Machine Learning techniques for voice conversion tasks. The whisper-to-normal speech conversion is particularly challenging, however, especially with respect to the Fundamental Frequency estimation. Based on the most recent literature, this survey assesses the state-of-the-art regarding Machine Learning based whisper-to-normal speech conversion, identifying trends both on modeling and training approaches. The proposed solutions include Generative Adversarial Network based, Autoencoder based and Bidirectional Long Short-Term Memory based frameworks, among other Deep Neural Network based architectures. In addition to Parallel versus Non-Parallel training, time-alignment requirements and strategies, datasets, vocoder usage, as well as both objective and subjective evaluation metrics are also covered by the present survey.https://journalengineering.fe.up.pt/index.php/upjeng/article/view/1298signal processingmachine learningwhispered speechnormal speechvoice conversionspeech conversion |
spellingShingle | Marco A. Oliveira Machine Learning Approaches for Whisper to Normal Speech Conversion U.Porto Journal of Engineering signal processing machine learning whispered speech normal speech voice conversion speech conversion |
title | Machine Learning Approaches for Whisper to Normal Speech Conversion |
title_full | Machine Learning Approaches for Whisper to Normal Speech Conversion |
title_fullStr | Machine Learning Approaches for Whisper to Normal Speech Conversion |
title_full_unstemmed | Machine Learning Approaches for Whisper to Normal Speech Conversion |
title_short | Machine Learning Approaches for Whisper to Normal Speech Conversion |
title_sort | machine learning approaches for whisper to normal speech conversion |
topic | signal processing machine learning whispered speech normal speech voice conversion speech conversion |
url | https://journalengineering.fe.up.pt/index.php/upjeng/article/view/1298 |
work_keys_str_mv | AT marcoaoliveira machinelearningapproachesforwhispertonormalspeechconversion |