MixGAN-TTS: Efficient and Stable Speech Synthesis Based on Diffusion Model

This paper describes MixGAN-TTS, an efficient and stable non-autoregressive speech synthesis based on diffusion model. The MixGAN-TTS uses a linguistic encoder based on soft phoneme-level alignment and hard word-level alignment approach which explicitly extracts word-level semantic information, and...

Full description

Bibliographic Details
Main Authors:	Yan Deng, Ning Wu, Chengjun Qiu, Yangyang Luo, Yan Chen
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Speech synthesis diffusion model mixture attention mechanism deep learning
Online Access:	https://ieeexplore.ieee.org/document/10145456/

Internet

https://ieeexplore.ieee.org/document/10145456/

MixGAN-TTS: Efficient and Stable Speech Synthesis Based on Diffusion Model

Internet

Similar Items