MixGAN-TTS: Efficient and Stable Speech Synthesis Based on Diffusion Model

This paper describes MixGAN-TTS, an efficient and stable non-autoregressive speech synthesis based on diffusion model. The MixGAN-TTS uses a linguistic encoder based on soft phoneme-level alignment and hard word-level alignment approach which explicitly extracts word-level semantic information, and...

Full description

Bibliographic Details
Main Authors: Yan Deng, Ning Wu, Chengjun Qiu, Yangyang Luo, Yan Chen
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10145456/