Acoustic Word Embeddings for End-to-End Speech Synthesis

The most recent end-to-end speech synthesis systems use phonemes as acoustic input tokens and ignore the information about which word the phonemes come from. However, many words have their specific prosody type, which may significantly affect the naturalness. Prior works have employed pre-trained li...

Full description

Bibliographic Details
Main Authors: Feiyu Shen, Chenpeng Du, Kai Yu
Format: Article
Language:English
Published: MDPI AG 2021-09-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/11/19/9010