Meta Learning Approach to Phone Duration Modeling

One of the essential prerequisites for achieving the naturalness of synthesized speech is the possibility of the automatic prediction of phone duration, due to the high importance of segmental duration in speech perception. In this paper we present a new phone duration prediction model for the Serbi...

Full description

Bibliographic Details
Main Authors:	Sandra Sovilj-Nikić, Ivan Sovilj-Nikić, Maja Marković
Format:	Article
Language:	English
Published:	Faculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek 2018-01-01
Series:	Tehnički Vjesnik
Subjects:	machine learning meta learning algorithm phone duration model synthesized speech
Online Access:	https://hrcak.srce.hr/file/298283

_version_	1797207643763245056
author	Sandra Sovilj-Nikić Ivan Sovilj-Nikić Maja Marković
author_facet	Sandra Sovilj-Nikić Ivan Sovilj-Nikić Maja Marković
author_sort	Sandra Sovilj-Nikić
collection	DOAJ
description	One of the essential prerequisites for achieving the naturalness of synthesized speech is the possibility of the automatic prediction of phone duration, due to the high importance of segmental duration in speech perception. In this paper we present a new phone duration prediction model for the Serbian language using meta learning approach. Based on the data obtained from the analysis of a large speech database, we used a feature set of 21 parameters describing phones and their contexts. These include attributes related to the segmental identity, manner of articulation (for consonants), attributes related to phonological context, such as segment types and voicing values of neighboring phones, presence or absence of lexical stress, morphological attributes, such as part-of-speech, and prosodic attributes, such as phonological word length, the position of the segment in the syllable, the position of the syllable in a word, the position of a word in a phrase, phrase break level, etc. Phone duration model obtained using meta learning algorithm outperformed the best individual model by approximately 2,0% and 1,7% in terms of the relative reduction of the root-mean-squared error and the mean absolute error, respectively.
first_indexed	2024-04-24T09:26:11Z
format	Article
id	doaj.art-daedaa8cd3404787a4bf7d0a3ab67df7
institution	Directory Open Access Journal
issn	1330-3651 1848-6339
language	English
last_indexed	2024-04-24T09:26:11Z
publishDate	2018-01-01
publisher	Faculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek
record_format	Article
series	Tehnički Vjesnik
spelling	doaj.art-daedaa8cd3404787a4bf7d0a3ab67df72024-04-15T14:53:49ZengFaculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in OsijekTehnički Vjesnik1330-36511848-63392018-01-0125385586010.17559/TV-20171002122930Meta Learning Approach to Phone Duration ModelingSandra Sovilj-Nikić0Ivan Sovilj-Nikić1Maja Marković2Iritel a.d. Beograd, Batajnički put 23, 11080 Beorgad, SerbiaUniversity of Novi Sad, Faculty of Technical Sciences, Trg Dositeja Obradovića 6, 21000 Novi Sad, SerbiaUniversity of Novi Sad, Faculty of Philosophy, Dr Zorana Đinđića 2, 21000 Novi Sad, SerbiaOne of the essential prerequisites for achieving the naturalness of synthesized speech is the possibility of the automatic prediction of phone duration, due to the high importance of segmental duration in speech perception. In this paper we present a new phone duration prediction model for the Serbian language using meta learning approach. Based on the data obtained from the analysis of a large speech database, we used a feature set of 21 parameters describing phones and their contexts. These include attributes related to the segmental identity, manner of articulation (for consonants), attributes related to phonological context, such as segment types and voicing values of neighboring phones, presence or absence of lexical stress, morphological attributes, such as part-of-speech, and prosodic attributes, such as phonological word length, the position of the segment in the syllable, the position of the syllable in a word, the position of a word in a phrase, phrase break level, etc. Phone duration model obtained using meta learning algorithm outperformed the best individual model by approximately 2,0% and 1,7% in terms of the relative reduction of the root-mean-squared error and the mean absolute error, respectively.https://hrcak.srce.hr/file/298283machine learningmeta learning algorithmphone duration modelsynthesized speech
spellingShingle	Sandra Sovilj-Nikić Ivan Sovilj-Nikić Maja Marković Meta Learning Approach to Phone Duration Modeling Tehnički Vjesnik machine learning meta learning algorithm phone duration model synthesized speech
title	Meta Learning Approach to Phone Duration Modeling
title_full	Meta Learning Approach to Phone Duration Modeling
title_fullStr	Meta Learning Approach to Phone Duration Modeling
title_full_unstemmed	Meta Learning Approach to Phone Duration Modeling
title_short	Meta Learning Approach to Phone Duration Modeling
title_sort	meta learning approach to phone duration modeling
topic	machine learning meta learning algorithm phone duration model synthesized speech
url	https://hrcak.srce.hr/file/298283
work_keys_str_mv	AT sandrasoviljnikic metalearningapproachtophonedurationmodeling AT ivansoviljnikic metalearningapproachtophonedurationmodeling AT majamarkovic metalearningapproachtophonedurationmodeling

Meta Learning Approach to Phone Duration Modeling

Similar Items