Punctuation-generation-inspired linguistic features for Mandarin prosody generation

Abstract This paper proposes two novel linguistic features extracted from text input for prosody generation in a Mandarin text-to-speech system. The first feature is the punctuation confidence (PC), which measures the likelihood that a major punctuation mark (MPM) can be inserted at a word boundary....

Full description

Bibliographic Details
Main Authors: Chen-Yu Chiang, Yu-Ping Hung, Han-Yun Yeh, I-Bin Liao, Chen-Ming Pan
Format: Article
Language:English
Published: SpringerOpen 2019-02-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13636-019-0147-y
_version_ 1818189908009811968
author Chen-Yu Chiang
Yu-Ping Hung
Han-Yun Yeh
I-Bin Liao
Chen-Ming Pan
author_facet Chen-Yu Chiang
Yu-Ping Hung
Han-Yun Yeh
I-Bin Liao
Chen-Ming Pan
author_sort Chen-Yu Chiang
collection DOAJ
description Abstract This paper proposes two novel linguistic features extracted from text input for prosody generation in a Mandarin text-to-speech system. The first feature is the punctuation confidence (PC), which measures the likelihood that a major punctuation mark (MPM) can be inserted at a word boundary. The second feature is the quotation confidence (QC), which measures the likelihood that a word string is quoted as a meaningful or emphasized unit. The proposed PC and QC features are influenced by the properties of automatic Chinese punctuation generation and linguistic characteristic of the Chinese punctuation system. Because MPMs are highly correlated with prosodic–acoustic features and quoted word strings serve crucial roles in human language understanding, the two features could potentially provide useful information for prosody generation. This idea was realized by employing conditional random-field-based models for predicting MPMs, quoted word string locations, and their associated confidences—that is, PC and QC—for each word boundary. The predicted punctuations and their confidences were then combined with traditional linguistic features to predict prosodic–acoustic features for performing speech synthesis using multilayer perceptrons. Both objective and subjective tests demonstrated that the prosody generated with the proposed linguistic features was superior to that generated without the proposed features. Therefore, the proposed PC and QC are identified as promising features for Mandarin prosody generation.
first_indexed 2024-12-11T23:50:17Z
format Article
id doaj.art-a82870608c964bcc945312fd7108fbae
institution Directory Open Access Journal
issn 1687-4722
language English
last_indexed 2024-12-11T23:50:17Z
publishDate 2019-02-01
publisher SpringerOpen
record_format Article
series EURASIP Journal on Audio, Speech, and Music Processing
spelling doaj.art-a82870608c964bcc945312fd7108fbae2022-12-22T00:45:29ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222019-02-012019112210.1186/s13636-019-0147-yPunctuation-generation-inspired linguistic features for Mandarin prosody generationChen-Yu Chiang0Yu-Ping Hung1Han-Yun Yeh2I-Bin Liao3Chen-Ming Pan4Department of Communication Engineering, National Taipei UniversityDepartment of Communication Engineering, National Taipei UniversityDepartment of Communication Engineering, National Taipei UniversityTelecommunication Laboratories, Chunghwa TelecomTelecommunication Laboratories, Chunghwa TelecomAbstract This paper proposes two novel linguistic features extracted from text input for prosody generation in a Mandarin text-to-speech system. The first feature is the punctuation confidence (PC), which measures the likelihood that a major punctuation mark (MPM) can be inserted at a word boundary. The second feature is the quotation confidence (QC), which measures the likelihood that a word string is quoted as a meaningful or emphasized unit. The proposed PC and QC features are influenced by the properties of automatic Chinese punctuation generation and linguistic characteristic of the Chinese punctuation system. Because MPMs are highly correlated with prosodic–acoustic features and quoted word strings serve crucial roles in human language understanding, the two features could potentially provide useful information for prosody generation. This idea was realized by employing conditional random-field-based models for predicting MPMs, quoted word string locations, and their associated confidences—that is, PC and QC—for each word boundary. The predicted punctuations and their confidences were then combined with traditional linguistic features to predict prosodic–acoustic features for performing speech synthesis using multilayer perceptrons. Both objective and subjective tests demonstrated that the prosody generated with the proposed linguistic features was superior to that generated without the proposed features. Therefore, the proposed PC and QC are identified as promising features for Mandarin prosody generation.http://link.springer.com/article/10.1186/s13636-019-0147-yConditional random fieldMultilayer perceptronText-to-speech systemProsody generationLinguistic featureSpeech synthesis
spellingShingle Chen-Yu Chiang
Yu-Ping Hung
Han-Yun Yeh
I-Bin Liao
Chen-Ming Pan
Punctuation-generation-inspired linguistic features for Mandarin prosody generation
EURASIP Journal on Audio, Speech, and Music Processing
Conditional random field
Multilayer perceptron
Text-to-speech system
Prosody generation
Linguistic feature
Speech synthesis
title Punctuation-generation-inspired linguistic features for Mandarin prosody generation
title_full Punctuation-generation-inspired linguistic features for Mandarin prosody generation
title_fullStr Punctuation-generation-inspired linguistic features for Mandarin prosody generation
title_full_unstemmed Punctuation-generation-inspired linguistic features for Mandarin prosody generation
title_short Punctuation-generation-inspired linguistic features for Mandarin prosody generation
title_sort punctuation generation inspired linguistic features for mandarin prosody generation
topic Conditional random field
Multilayer perceptron
Text-to-speech system
Prosody generation
Linguistic feature
Speech synthesis
url http://link.springer.com/article/10.1186/s13636-019-0147-y
work_keys_str_mv AT chenyuchiang punctuationgenerationinspiredlinguisticfeaturesformandarinprosodygeneration
AT yupinghung punctuationgenerationinspiredlinguisticfeaturesformandarinprosodygeneration
AT hanyunyeh punctuationgenerationinspiredlinguisticfeaturesformandarinprosodygeneration
AT ibinliao punctuationgenerationinspiredlinguisticfeaturesformandarinprosodygeneration
AT chenmingpan punctuationgenerationinspiredlinguisticfeaturesformandarinprosodygeneration