GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale
The paper focuses on the description of a system for the automatic evaluation of synthetic speech quality based on the Gaussian mixture model (GMM) classifier. The speech material originating from a real speaker is compared with synthesized material to determine similarities or differences between t...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-12-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/11/1/2 |
_version_ | 1797543911172866048 |
---|---|
author | Jiří Přibil Anna Přibilová Jindřich Matoušek |
author_facet | Jiří Přibil Anna Přibilová Jindřich Matoušek |
author_sort | Jiří Přibil |
collection | DOAJ |
description | The paper focuses on the description of a system for the automatic evaluation of synthetic speech quality based on the Gaussian mixture model (GMM) classifier. The speech material originating from a real speaker is compared with synthesized material to determine similarities or differences between them. The final evaluation order is determined by distances in the Pleasure-Arousal (P-A) space between the original and synthetic speech using different synthesis and/or prosody manipulation methods implemented in the Czech text-to-speech system. The GMM models for continual 2D detection of P-A classes are trained using the sound/speech material from the databases without any relation to the original speech or the synthesized sentences. Preliminary and auxiliary analyses show a substantial influence of the number of mixtures, the number and type of the speech features used the size of the processed speech material, as well as the type of the database used for the creation of the GMMs on the P-A classification process and on the final evaluation result. The main evaluation experiments confirm the functionality of the system developed. The objective evaluation results obtained are principally correlated with the subjective ratings of human evaluators; however, partial differences were indicated, so a subsequent detailed investigation must be performed. |
first_indexed | 2024-03-10T13:52:10Z |
format | Article |
id | doaj.art-46203629ba994a0e9da51ac05aa0bb48 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-10T13:52:10Z |
publishDate | 2020-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-46203629ba994a0e9da51ac05aa0bb482023-11-21T02:00:09ZengMDPI AGApplied Sciences2076-34172020-12-01111210.3390/app11010002GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal ScaleJiří Přibil0Anna Přibilová1Jindřich Matoušek2Institute of Measurement Science, Slovak Academy of Sciences, 841 04 Bratislava, SlovakiaInstitute of Measurement Science, Slovak Academy of Sciences, 841 04 Bratislava, SlovakiaFaculty of Applied Sciences, UWB, 306 14 Pilsen, Czech RepublicThe paper focuses on the description of a system for the automatic evaluation of synthetic speech quality based on the Gaussian mixture model (GMM) classifier. The speech material originating from a real speaker is compared with synthesized material to determine similarities or differences between them. The final evaluation order is determined by distances in the Pleasure-Arousal (P-A) space between the original and synthetic speech using different synthesis and/or prosody manipulation methods implemented in the Czech text-to-speech system. The GMM models for continual 2D detection of P-A classes are trained using the sound/speech material from the databases without any relation to the original speech or the synthesized sentences. Preliminary and auxiliary analyses show a substantial influence of the number of mixtures, the number and type of the speech features used the size of the processed speech material, as well as the type of the database used for the creation of the GMMs on the P-A classification process and on the final evaluation result. The main evaluation experiments confirm the functionality of the system developed. The objective evaluation results obtained are principally correlated with the subjective ratings of human evaluators; however, partial differences were indicated, so a subsequent detailed investigation must be performed.https://www.mdpi.com/2076-3417/11/1/2GMM classificationstatistical analysissynthetic speech evaluationtext-to-speech system |
spellingShingle | Jiří Přibil Anna Přibilová Jindřich Matoušek GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale Applied Sciences GMM classification statistical analysis synthetic speech evaluation text-to-speech system |
title | GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale |
title_full | GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale |
title_fullStr | GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale |
title_full_unstemmed | GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale |
title_short | GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale |
title_sort | gmm based evaluation of synthetic speech quality using 2d classification in pleasure arousal scale |
topic | GMM classification statistical analysis synthetic speech evaluation text-to-speech system |
url | https://www.mdpi.com/2076-3417/11/1/2 |
work_keys_str_mv | AT jiripribil gmmbasedevaluationofsyntheticspeechqualityusing2dclassificationinpleasurearousalscale AT annapribilova gmmbasedevaluationofsyntheticspeechqualityusing2dclassificationinpleasurearousalscale AT jindrichmatousek gmmbasedevaluationofsyntheticspeechqualityusing2dclassificationinpleasurearousalscale |