GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale

The paper focuses on the description of a system for the automatic evaluation of synthetic speech quality based on the Gaussian mixture model (GMM) classifier. The speech material originating from a real speaker is compared with synthesized material to determine similarities or differences between t...

Full description

Bibliographic Details
Main Authors:	Jiří Přibil, Anna Přibilová, Jindřich Matoušek
Format:	Article
Language:	English
Published:	MDPI AG 2020-12-01
Series:	Applied Sciences
Subjects:	GMM classification statistical analysis synthetic speech evaluation text-to-speech system
Online Access:	https://www.mdpi.com/2076-3417/11/1/2

_version_	1797543911172866048
author	Jiří Přibil Anna Přibilová Jindřich Matoušek
author_facet	Jiří Přibil Anna Přibilová Jindřich Matoušek
author_sort	Jiří Přibil
collection	DOAJ
description	The paper focuses on the description of a system for the automatic evaluation of synthetic speech quality based on the Gaussian mixture model (GMM) classifier. The speech material originating from a real speaker is compared with synthesized material to determine similarities or differences between them. The final evaluation order is determined by distances in the Pleasure-Arousal (P-A) space between the original and synthetic speech using different synthesis and/or prosody manipulation methods implemented in the Czech text-to-speech system. The GMM models for continual 2D detection of P-A classes are trained using the sound/speech material from the databases without any relation to the original speech or the synthesized sentences. Preliminary and auxiliary analyses show a substantial influence of the number of mixtures, the number and type of the speech features used the size of the processed speech material, as well as the type of the database used for the creation of the GMMs on the P-A classification process and on the final evaluation result. The main evaluation experiments confirm the functionality of the system developed. The objective evaluation results obtained are principally correlated with the subjective ratings of human evaluators; however, partial differences were indicated, so a subsequent detailed investigation must be performed.
first_indexed	2024-03-10T13:52:10Z
format	Article
id	doaj.art-46203629ba994a0e9da51ac05aa0bb48
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-10T13:52:10Z
publishDate	2020-12-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-46203629ba994a0e9da51ac05aa0bb482023-11-21T02:00:09ZengMDPI AGApplied Sciences2076-34172020-12-01111210.3390/app11010002GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal ScaleJiří Přibil0Anna Přibilová1Jindřich Matoušek2Institute of Measurement Science, Slovak Academy of Sciences, 841 04 Bratislava, SlovakiaInstitute of Measurement Science, Slovak Academy of Sciences, 841 04 Bratislava, SlovakiaFaculty of Applied Sciences, UWB, 306 14 Pilsen, Czech RepublicThe paper focuses on the description of a system for the automatic evaluation of synthetic speech quality based on the Gaussian mixture model (GMM) classifier. The speech material originating from a real speaker is compared with synthesized material to determine similarities or differences between them. The final evaluation order is determined by distances in the Pleasure-Arousal (P-A) space between the original and synthetic speech using different synthesis and/or prosody manipulation methods implemented in the Czech text-to-speech system. The GMM models for continual 2D detection of P-A classes are trained using the sound/speech material from the databases without any relation to the original speech or the synthesized sentences. Preliminary and auxiliary analyses show a substantial influence of the number of mixtures, the number and type of the speech features used the size of the processed speech material, as well as the type of the database used for the creation of the GMMs on the P-A classification process and on the final evaluation result. The main evaluation experiments confirm the functionality of the system developed. The objective evaluation results obtained are principally correlated with the subjective ratings of human evaluators; however, partial differences were indicated, so a subsequent detailed investigation must be performed.https://www.mdpi.com/2076-3417/11/1/2GMM classificationstatistical analysissynthetic speech evaluationtext-to-speech system
spellingShingle	Jiří Přibil Anna Přibilová Jindřich Matoušek GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale Applied Sciences GMM classification statistical analysis synthetic speech evaluation text-to-speech system
title	GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale
title_full	GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale
title_fullStr	GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale
title_full_unstemmed	GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale
title_short	GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale
title_sort	gmm based evaluation of synthetic speech quality using 2d classification in pleasure arousal scale
topic	GMM classification statistical analysis synthetic speech evaluation text-to-speech system
url	https://www.mdpi.com/2076-3417/11/1/2
work_keys_str_mv	AT jiripribil gmmbasedevaluationofsyntheticspeechqualityusing2dclassificationinpleasurearousalscale AT annapribilova gmmbasedevaluationofsyntheticspeechqualityusing2dclassificationinpleasurearousalscale AT jindrichmatousek gmmbasedevaluationofsyntheticspeechqualityusing2dclassificationinpleasurearousalscale

GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale

Similar Items