Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech

Although recent neural text-to-speech (TTS) systems have achieved high-quality speech synthesis, there are cases where a TTS system generates low-quality speech, mainly caused by limited training data or information loss during knowledge distillation. Therefore, we propose a novel method to improve...

Full description

Bibliographic Details
Main Authors: Yeunju Choi, Youngmoon Jung, Youngjoo Suh, Hoirin Kim
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9775804/