Punctuation-generation-inspired linguistic features for Mandarin prosody generation
Abstract This paper proposes two novel linguistic features extracted from text input for prosody generation in a Mandarin text-to-speech system. The first feature is the punctuation confidence (PC), which measures the likelihood that a major punctuation mark (MPM) can be inserted at a word boundary....
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2019-02-01
|
Series: | EURASIP Journal on Audio, Speech, and Music Processing |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s13636-019-0147-y |
_version_ | 1818189908009811968 |
---|---|
author | Chen-Yu Chiang Yu-Ping Hung Han-Yun Yeh I-Bin Liao Chen-Ming Pan |
author_facet | Chen-Yu Chiang Yu-Ping Hung Han-Yun Yeh I-Bin Liao Chen-Ming Pan |
author_sort | Chen-Yu Chiang |
collection | DOAJ |
description | Abstract This paper proposes two novel linguistic features extracted from text input for prosody generation in a Mandarin text-to-speech system. The first feature is the punctuation confidence (PC), which measures the likelihood that a major punctuation mark (MPM) can be inserted at a word boundary. The second feature is the quotation confidence (QC), which measures the likelihood that a word string is quoted as a meaningful or emphasized unit. The proposed PC and QC features are influenced by the properties of automatic Chinese punctuation generation and linguistic characteristic of the Chinese punctuation system. Because MPMs are highly correlated with prosodic–acoustic features and quoted word strings serve crucial roles in human language understanding, the two features could potentially provide useful information for prosody generation. This idea was realized by employing conditional random-field-based models for predicting MPMs, quoted word string locations, and their associated confidences—that is, PC and QC—for each word boundary. The predicted punctuations and their confidences were then combined with traditional linguistic features to predict prosodic–acoustic features for performing speech synthesis using multilayer perceptrons. Both objective and subjective tests demonstrated that the prosody generated with the proposed linguistic features was superior to that generated without the proposed features. Therefore, the proposed PC and QC are identified as promising features for Mandarin prosody generation. |
first_indexed | 2024-12-11T23:50:17Z |
format | Article |
id | doaj.art-a82870608c964bcc945312fd7108fbae |
institution | Directory Open Access Journal |
issn | 1687-4722 |
language | English |
last_indexed | 2024-12-11T23:50:17Z |
publishDate | 2019-02-01 |
publisher | SpringerOpen |
record_format | Article |
series | EURASIP Journal on Audio, Speech, and Music Processing |
spelling | doaj.art-a82870608c964bcc945312fd7108fbae2022-12-22T00:45:29ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222019-02-012019112210.1186/s13636-019-0147-yPunctuation-generation-inspired linguistic features for Mandarin prosody generationChen-Yu Chiang0Yu-Ping Hung1Han-Yun Yeh2I-Bin Liao3Chen-Ming Pan4Department of Communication Engineering, National Taipei UniversityDepartment of Communication Engineering, National Taipei UniversityDepartment of Communication Engineering, National Taipei UniversityTelecommunication Laboratories, Chunghwa TelecomTelecommunication Laboratories, Chunghwa TelecomAbstract This paper proposes two novel linguistic features extracted from text input for prosody generation in a Mandarin text-to-speech system. The first feature is the punctuation confidence (PC), which measures the likelihood that a major punctuation mark (MPM) can be inserted at a word boundary. The second feature is the quotation confidence (QC), which measures the likelihood that a word string is quoted as a meaningful or emphasized unit. The proposed PC and QC features are influenced by the properties of automatic Chinese punctuation generation and linguistic characteristic of the Chinese punctuation system. Because MPMs are highly correlated with prosodic–acoustic features and quoted word strings serve crucial roles in human language understanding, the two features could potentially provide useful information for prosody generation. This idea was realized by employing conditional random-field-based models for predicting MPMs, quoted word string locations, and their associated confidences—that is, PC and QC—for each word boundary. The predicted punctuations and their confidences were then combined with traditional linguistic features to predict prosodic–acoustic features for performing speech synthesis using multilayer perceptrons. Both objective and subjective tests demonstrated that the prosody generated with the proposed linguistic features was superior to that generated without the proposed features. Therefore, the proposed PC and QC are identified as promising features for Mandarin prosody generation.http://link.springer.com/article/10.1186/s13636-019-0147-yConditional random fieldMultilayer perceptronText-to-speech systemProsody generationLinguistic featureSpeech synthesis |
spellingShingle | Chen-Yu Chiang Yu-Ping Hung Han-Yun Yeh I-Bin Liao Chen-Ming Pan Punctuation-generation-inspired linguistic features for Mandarin prosody generation EURASIP Journal on Audio, Speech, and Music Processing Conditional random field Multilayer perceptron Text-to-speech system Prosody generation Linguistic feature Speech synthesis |
title | Punctuation-generation-inspired linguistic features for Mandarin prosody generation |
title_full | Punctuation-generation-inspired linguistic features for Mandarin prosody generation |
title_fullStr | Punctuation-generation-inspired linguistic features for Mandarin prosody generation |
title_full_unstemmed | Punctuation-generation-inspired linguistic features for Mandarin prosody generation |
title_short | Punctuation-generation-inspired linguistic features for Mandarin prosody generation |
title_sort | punctuation generation inspired linguistic features for mandarin prosody generation |
topic | Conditional random field Multilayer perceptron Text-to-speech system Prosody generation Linguistic feature Speech synthesis |
url | http://link.springer.com/article/10.1186/s13636-019-0147-y |
work_keys_str_mv | AT chenyuchiang punctuationgenerationinspiredlinguisticfeaturesformandarinprosodygeneration AT yupinghung punctuationgenerationinspiredlinguisticfeaturesformandarinprosodygeneration AT hanyunyeh punctuationgenerationinspiredlinguisticfeaturesformandarinprosodygeneration AT ibinliao punctuationgenerationinspiredlinguisticfeaturesformandarinprosodygeneration AT chenmingpan punctuationgenerationinspiredlinguisticfeaturesformandarinprosodygeneration |