Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning

Voice conversion (VC) transforms the speaking style of a source speaker to the speaking style of a target speaker by keeping linguistic information unchanged. Traditional VC techniques rely on parallel recordings of multiple speakers uttering the same sentences. Earlier approaches mainly find a mapp...

Full description

Bibliographic Details
Main Authors:	Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Géza Németh
Format:	Article
Language:	English
Published:	MDPI AG 2021-08-01
Series:	Applied Sciences
Subjects:	sinusoidal model non-parallel voice conversion generative adversarial networks continuous parameters
Online Access:	https://www.mdpi.com/2076-3417/11/16/7489

_version_	1797524751512502272
author	Mohammed Salah Al-Radhi Tamás Gábor Csapó Géza Németh
author_facet	Mohammed Salah Al-Radhi Tamás Gábor Csapó Géza Németh
author_sort	Mohammed Salah Al-Radhi
collection	DOAJ
description	Voice conversion (VC) transforms the speaking style of a source speaker to the speaking style of a target speaker by keeping linguistic information unchanged. Traditional VC techniques rely on parallel recordings of multiple speakers uttering the same sentences. Earlier approaches mainly find a mapping between the given source–target speakers, which contain pairs of similar utterances spoken by different speakers. However, parallel data are computationally expensive and difficult to collect. Non-parallel VC remains an interesting but challenging speech processing task. To address this limitation, we propose a method that allows a non-parallel many-to-many voice conversion by using a generative adversarial network. To the best of the authors’ knowledge, our study is the first one that employs a sinusoidal model with continuous parameters to generate converted speech signals. Our method involves only several minutes of training examples without parallel utterances or time alignment procedures, where the source–target speakers are entirely unseen by the training dataset. Moreover, empirical study is carried out on the publicly available CSTR VCTK corpus. Our conclusions indicate that the proposed method reached the state-of-the-art results in speaker similarity to the utterance produced by the target speaker, while suggesting important structural ones to be further analyzed by experts.
first_indexed	2024-03-10T09:01:00Z
format	Article
id	doaj.art-0931fd748ab94e599540bb8bf560e1a8
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-10T09:01:00Z
publishDate	2021-08-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-0931fd748ab94e599540bb8bf560e1a82023-11-22T06:42:25ZengMDPI AGApplied Sciences2076-34172021-08-011116748910.3390/app11167489Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial LearningMohammed Salah Al-Radhi0Tamás Gábor Csapó1Géza Németh2Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, 1111 Budapest, HungaryDepartment of Telecommunications and Media Informatics, Budapest University of Technology and Economics, 1111 Budapest, HungaryDepartment of Telecommunications and Media Informatics, Budapest University of Technology and Economics, 1111 Budapest, HungaryVoice conversion (VC) transforms the speaking style of a source speaker to the speaking style of a target speaker by keeping linguistic information unchanged. Traditional VC techniques rely on parallel recordings of multiple speakers uttering the same sentences. Earlier approaches mainly find a mapping between the given source–target speakers, which contain pairs of similar utterances spoken by different speakers. However, parallel data are computationally expensive and difficult to collect. Non-parallel VC remains an interesting but challenging speech processing task. To address this limitation, we propose a method that allows a non-parallel many-to-many voice conversion by using a generative adversarial network. To the best of the authors’ knowledge, our study is the first one that employs a sinusoidal model with continuous parameters to generate converted speech signals. Our method involves only several minutes of training examples without parallel utterances or time alignment procedures, where the source–target speakers are entirely unseen by the training dataset. Moreover, empirical study is carried out on the publicly available CSTR VCTK corpus. Our conclusions indicate that the proposed method reached the state-of-the-art results in speaker similarity to the utterance produced by the target speaker, while suggesting important structural ones to be further analyzed by experts.https://www.mdpi.com/2076-3417/11/16/7489sinusoidal modelnon-parallel voice conversiongenerative adversarial networkscontinuous parameters
spellingShingle	Mohammed Salah Al-Radhi Tamás Gábor Csapó Géza Németh Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning Applied Sciences sinusoidal model non-parallel voice conversion generative adversarial networks continuous parameters
title	Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning
title_full	Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning
title_fullStr	Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning
title_full_unstemmed	Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning
title_short	Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning
title_sort	effects of sinusoidal model on non parallel voice conversion with adversarial learning
topic	sinusoidal model non-parallel voice conversion generative adversarial networks continuous parameters
url	https://www.mdpi.com/2076-3417/11/16/7489
work_keys_str_mv	AT mohammedsalahalradhi effectsofsinusoidalmodelonnonparallelvoiceconversionwithadversariallearning AT tamasgaborcsapo effectsofsinusoidalmodelonnonparallelvoiceconversionwithadversariallearning AT gezanemeth effectsofsinusoidalmodelonnonparallelvoiceconversionwithadversariallearning

Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning

Similar Items