Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning
Voice conversion (VC) transforms the speaking style of a source speaker to the speaking style of a target speaker by keeping linguistic information unchanged. Traditional VC techniques rely on parallel recordings of multiple speakers uttering the same sentences. Earlier approaches mainly find a mapp...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-08-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/11/16/7489 |
_version_ | 1797524751512502272 |
---|---|
author | Mohammed Salah Al-Radhi Tamás Gábor Csapó Géza Németh |
author_facet | Mohammed Salah Al-Radhi Tamás Gábor Csapó Géza Németh |
author_sort | Mohammed Salah Al-Radhi |
collection | DOAJ |
description | Voice conversion (VC) transforms the speaking style of a source speaker to the speaking style of a target speaker by keeping linguistic information unchanged. Traditional VC techniques rely on parallel recordings of multiple speakers uttering the same sentences. Earlier approaches mainly find a mapping between the given source–target speakers, which contain pairs of similar utterances spoken by different speakers. However, parallel data are computationally expensive and difficult to collect. Non-parallel VC remains an interesting but challenging speech processing task. To address this limitation, we propose a method that allows a non-parallel many-to-many voice conversion by using a generative adversarial network. To the best of the authors’ knowledge, our study is the first one that employs a sinusoidal model with continuous parameters to generate converted speech signals. Our method involves only several minutes of training examples without parallel utterances or time alignment procedures, where the source–target speakers are entirely unseen by the training dataset. Moreover, empirical study is carried out on the publicly available CSTR VCTK corpus. Our conclusions indicate that the proposed method reached the state-of-the-art results in speaker similarity to the utterance produced by the target speaker, while suggesting important structural ones to be further analyzed by experts. |
first_indexed | 2024-03-10T09:01:00Z |
format | Article |
id | doaj.art-0931fd748ab94e599540bb8bf560e1a8 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-10T09:01:00Z |
publishDate | 2021-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-0931fd748ab94e599540bb8bf560e1a82023-11-22T06:42:25ZengMDPI AGApplied Sciences2076-34172021-08-011116748910.3390/app11167489Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial LearningMohammed Salah Al-Radhi0Tamás Gábor Csapó1Géza Németh2Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, 1111 Budapest, HungaryDepartment of Telecommunications and Media Informatics, Budapest University of Technology and Economics, 1111 Budapest, HungaryDepartment of Telecommunications and Media Informatics, Budapest University of Technology and Economics, 1111 Budapest, HungaryVoice conversion (VC) transforms the speaking style of a source speaker to the speaking style of a target speaker by keeping linguistic information unchanged. Traditional VC techniques rely on parallel recordings of multiple speakers uttering the same sentences. Earlier approaches mainly find a mapping between the given source–target speakers, which contain pairs of similar utterances spoken by different speakers. However, parallel data are computationally expensive and difficult to collect. Non-parallel VC remains an interesting but challenging speech processing task. To address this limitation, we propose a method that allows a non-parallel many-to-many voice conversion by using a generative adversarial network. To the best of the authors’ knowledge, our study is the first one that employs a sinusoidal model with continuous parameters to generate converted speech signals. Our method involves only several minutes of training examples without parallel utterances or time alignment procedures, where the source–target speakers are entirely unseen by the training dataset. Moreover, empirical study is carried out on the publicly available CSTR VCTK corpus. Our conclusions indicate that the proposed method reached the state-of-the-art results in speaker similarity to the utterance produced by the target speaker, while suggesting important structural ones to be further analyzed by experts.https://www.mdpi.com/2076-3417/11/16/7489sinusoidal modelnon-parallel voice conversiongenerative adversarial networkscontinuous parameters |
spellingShingle | Mohammed Salah Al-Radhi Tamás Gábor Csapó Géza Németh Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning Applied Sciences sinusoidal model non-parallel voice conversion generative adversarial networks continuous parameters |
title | Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning |
title_full | Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning |
title_fullStr | Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning |
title_full_unstemmed | Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning |
title_short | Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning |
title_sort | effects of sinusoidal model on non parallel voice conversion with adversarial learning |
topic | sinusoidal model non-parallel voice conversion generative adversarial networks continuous parameters |
url | https://www.mdpi.com/2076-3417/11/16/7489 |
work_keys_str_mv | AT mohammedsalahalradhi effectsofsinusoidalmodelonnonparallelvoiceconversionwithadversariallearning AT tamasgaborcsapo effectsofsinusoidalmodelonnonparallelvoiceconversionwithadversariallearning AT gezanemeth effectsofsinusoidalmodelonnonparallelvoiceconversionwithadversariallearning |