Voice Conversion Using a Perceptual Criterion

In voice conversion (VC), it is highly desirable to obtain transformed speech signals that are perceptually close to a target speaker’s voice. To this end, a perceptually meaningful criterion where the human auditory system was taken into consideration in measuring the distances between the converte...

Full description

Bibliographic Details
Main Author: Ki-Seung Lee
Format: Article
Language:English
Published: MDPI AG 2020-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/8/2884
_version_ 1797570071473684480
author Ki-Seung Lee
author_facet Ki-Seung Lee
author_sort Ki-Seung Lee
collection DOAJ
description In voice conversion (VC), it is highly desirable to obtain transformed speech signals that are perceptually close to a target speaker’s voice. To this end, a perceptually meaningful criterion where the human auditory system was taken into consideration in measuring the distances between the converted and the target voices was adopted in the proposed VC scheme. The conversion rules for the features associated with the spectral envelope and the pitch modification factor were jointly constructed so that perceptual distance measurement was minimized. This minimization problem was solved using a deep neural network (DNN) framework where input features and target features were derived from source speech signals and time-aligned version of target speech signals, respectively. The validation tests were carried out for the CMU ARCTIC database to evaluate the effectiveness of the proposed method, especially in terms of perceptual quality. The experimental results showed that the proposed method yielded perceptually preferred results compared with independent conversion using conventional mean-square error (MSE) criterion. The maximum improvement in perceptual evaluation of speech quality (PESQ) was 0.312, compared with the conventional VC method.
first_indexed 2024-03-10T20:19:52Z
format Article
id doaj.art-fb1f585af195451bbe205f19aedbe8d6
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T20:19:52Z
publishDate 2020-04-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-fb1f585af195451bbe205f19aedbe8d62023-11-19T22:19:47ZengMDPI AGApplied Sciences2076-34172020-04-01108288410.3390/app10082884Voice Conversion Using a Perceptual CriterionKi-Seung Lee0Department of Electronic Engineering, Konkuk University, 1 Hwayang-dong, Gwangjin-gu, Seoul 143-701, KoreaIn voice conversion (VC), it is highly desirable to obtain transformed speech signals that are perceptually close to a target speaker’s voice. To this end, a perceptually meaningful criterion where the human auditory system was taken into consideration in measuring the distances between the converted and the target voices was adopted in the proposed VC scheme. The conversion rules for the features associated with the spectral envelope and the pitch modification factor were jointly constructed so that perceptual distance measurement was minimized. This minimization problem was solved using a deep neural network (DNN) framework where input features and target features were derived from source speech signals and time-aligned version of target speech signals, respectively. The validation tests were carried out for the CMU ARCTIC database to evaluate the effectiveness of the proposed method, especially in terms of perceptual quality. The experimental results showed that the proposed method yielded perceptually preferred results compared with independent conversion using conventional mean-square error (MSE) criterion. The maximum improvement in perceptual evaluation of speech quality (PESQ) was 0.312, compared with the conventional VC method.https://www.mdpi.com/2076-3417/10/8/2884voice conversionjoint conversionperceptual distance measure
spellingShingle Ki-Seung Lee
Voice Conversion Using a Perceptual Criterion
Applied Sciences
voice conversion
joint conversion
perceptual distance measure
title Voice Conversion Using a Perceptual Criterion
title_full Voice Conversion Using a Perceptual Criterion
title_fullStr Voice Conversion Using a Perceptual Criterion
title_full_unstemmed Voice Conversion Using a Perceptual Criterion
title_short Voice Conversion Using a Perceptual Criterion
title_sort voice conversion using a perceptual criterion
topic voice conversion
joint conversion
perceptual distance measure
url https://www.mdpi.com/2076-3417/10/8/2884
work_keys_str_mv AT kiseunglee voiceconversionusingaperceptualcriterion