GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale

The paper focuses on the description of a system for the automatic evaluation of synthetic speech quality based on the Gaussian mixture model (GMM) classifier. The speech material originating from a real speaker is compared with synthesized material to determine similarities or differences between t...

Full description

Bibliographic Details
Main Authors: Jiří Přibil, Anna Přibilová, Jindřich Matoušek
Format: Article
Language:English
Published: MDPI AG 2020-12-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/11/1/2
_version_ 1797543911172866048
author Jiří Přibil
Anna Přibilová
Jindřich Matoušek
author_facet Jiří Přibil
Anna Přibilová
Jindřich Matoušek
author_sort Jiří Přibil
collection DOAJ
description The paper focuses on the description of a system for the automatic evaluation of synthetic speech quality based on the Gaussian mixture model (GMM) classifier. The speech material originating from a real speaker is compared with synthesized material to determine similarities or differences between them. The final evaluation order is determined by distances in the Pleasure-Arousal (P-A) space between the original and synthetic speech using different synthesis and/or prosody manipulation methods implemented in the Czech text-to-speech system. The GMM models for continual 2D detection of P-A classes are trained using the sound/speech material from the databases without any relation to the original speech or the synthesized sentences. Preliminary and auxiliary analyses show a substantial influence of the number of mixtures, the number and type of the speech features used the size of the processed speech material, as well as the type of the database used for the creation of the GMMs on the P-A classification process and on the final evaluation result. The main evaluation experiments confirm the functionality of the system developed. The objective evaluation results obtained are principally correlated with the subjective ratings of human evaluators; however, partial differences were indicated, so a subsequent detailed investigation must be performed.
first_indexed 2024-03-10T13:52:10Z
format Article
id doaj.art-46203629ba994a0e9da51ac05aa0bb48
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T13:52:10Z
publishDate 2020-12-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-46203629ba994a0e9da51ac05aa0bb482023-11-21T02:00:09ZengMDPI AGApplied Sciences2076-34172020-12-01111210.3390/app11010002GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal ScaleJiří Přibil0Anna Přibilová1Jindřich Matoušek2Institute of Measurement Science, Slovak Academy of Sciences, 841 04 Bratislava, SlovakiaInstitute of Measurement Science, Slovak Academy of Sciences, 841 04 Bratislava, SlovakiaFaculty of Applied Sciences, UWB, 306 14 Pilsen, Czech RepublicThe paper focuses on the description of a system for the automatic evaluation of synthetic speech quality based on the Gaussian mixture model (GMM) classifier. The speech material originating from a real speaker is compared with synthesized material to determine similarities or differences between them. The final evaluation order is determined by distances in the Pleasure-Arousal (P-A) space between the original and synthetic speech using different synthesis and/or prosody manipulation methods implemented in the Czech text-to-speech system. The GMM models for continual 2D detection of P-A classes are trained using the sound/speech material from the databases without any relation to the original speech or the synthesized sentences. Preliminary and auxiliary analyses show a substantial influence of the number of mixtures, the number and type of the speech features used the size of the processed speech material, as well as the type of the database used for the creation of the GMMs on the P-A classification process and on the final evaluation result. The main evaluation experiments confirm the functionality of the system developed. The objective evaluation results obtained are principally correlated with the subjective ratings of human evaluators; however, partial differences were indicated, so a subsequent detailed investigation must be performed.https://www.mdpi.com/2076-3417/11/1/2GMM classificationstatistical analysissynthetic speech evaluationtext-to-speech system
spellingShingle Jiří Přibil
Anna Přibilová
Jindřich Matoušek
GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale
Applied Sciences
GMM classification
statistical analysis
synthetic speech evaluation
text-to-speech system
title GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale
title_full GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale
title_fullStr GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale
title_full_unstemmed GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale
title_short GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale
title_sort gmm based evaluation of synthetic speech quality using 2d classification in pleasure arousal scale
topic GMM classification
statistical analysis
synthetic speech evaluation
text-to-speech system
url https://www.mdpi.com/2076-3417/11/1/2
work_keys_str_mv AT jiripribil gmmbasedevaluationofsyntheticspeechqualityusing2dclassificationinpleasurearousalscale
AT annapribilova gmmbasedevaluationofsyntheticspeechqualityusing2dclassificationinpleasurearousalscale
AT jindrichmatousek gmmbasedevaluationofsyntheticspeechqualityusing2dclassificationinpleasurearousalscale