W2VC: WavLM representation based one-shot voice conversion with gradient reversal distillation and CTC supervision

Abstract Non-parallel data voice conversion (VC) has achieved considerable breakthroughs due to self-supervised pre-trained representation (SSPR) being used in recent years. Features extracted by the pre-trained model are expected to contain more content information. However, in common VC with SSPR,...

Full description

Bibliographic Details
Main Authors: Hao Huang, Lin Wang, Jichen Yang, Ying Hu, Liang He
Format: Article
Language:English
Published: SpringerOpen 2023-10-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Subjects:
Online Access:https://doi.org/10.1186/s13636-023-00312-8