On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech

<p/> <p>Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either natural or synthesized speech. However, the perception of mismatches b...

Full description

Bibliographic Details
Main Authors: Mattheyses Wesley, Latacz Lukas, Verhelst Werner
Format: Article
Language:English
Published: SpringerOpen 2009-01-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Online Access:http://asmp.eurasipjournals.com/content/2009/169819
_version_ 1829098996205879296
author Mattheyses Wesley
Latacz Lukas
Verhelst Werner
author_facet Mattheyses Wesley
Latacz Lukas
Verhelst Werner
author_sort Mattheyses Wesley
collection DOAJ
description <p/> <p>Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either natural or synthesized speech. However, the perception of mismatches between these two information streams requires experimental exploration since it could degrade the quality of the output. In order to increase the intermodal coherence in synthetic 2D photorealistic speech, we extended the well-known unit selection audio synthesis technique to work with multimodal segments containing original combinations of audio and video. Subjective experiments confirm that the audiovisual signals created by our multimodal synthesis strategy are indeed perceived as being more synchronous than those of systems in which both modes are not intrinsically coherent. Furthermore, it is shown that the degree of coherence between the auditory mode and the visual mode has an influence on the perceived quality of the synthetic visual speech fragment. In addition, the audio quality was found to have only a minor influence on the perceived visual signal's quality.</p>
first_indexed 2024-12-10T21:16:57Z
format Article
id doaj.art-b01d2ca3fc8e43c68c6899da72701e28
institution Directory Open Access Journal
issn 1687-4714
1687-4722
language English
last_indexed 2024-12-10T21:16:57Z
publishDate 2009-01-01
publisher SpringerOpen
record_format Article
series EURASIP Journal on Audio, Speech, and Music Processing
spelling doaj.art-b01d2ca3fc8e43c68c6899da72701e282022-12-22T01:33:15ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47141687-47222009-01-0120091169819On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual SpeechMattheyses WesleyLatacz LukasVerhelst Werner<p/> <p>Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either natural or synthesized speech. However, the perception of mismatches between these two information streams requires experimental exploration since it could degrade the quality of the output. In order to increase the intermodal coherence in synthetic 2D photorealistic speech, we extended the well-known unit selection audio synthesis technique to work with multimodal segments containing original combinations of audio and video. Subjective experiments confirm that the audiovisual signals created by our multimodal synthesis strategy are indeed perceived as being more synchronous than those of systems in which both modes are not intrinsically coherent. Furthermore, it is shown that the degree of coherence between the auditory mode and the visual mode has an influence on the perceived quality of the synthetic visual speech fragment. In addition, the audio quality was found to have only a minor influence on the perceived visual signal's quality.</p>http://asmp.eurasipjournals.com/content/2009/169819
spellingShingle Mattheyses Wesley
Latacz Lukas
Verhelst Werner
On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech
EURASIP Journal on Audio, Speech, and Music Processing
title On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech
title_full On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech
title_fullStr On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech
title_full_unstemmed On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech
title_short On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech
title_sort on the importance of audiovisual coherence for the perceived quality of synthesized visual speech
url http://asmp.eurasipjournals.com/content/2009/169819
work_keys_str_mv AT mattheyseswesley ontheimportanceofaudiovisualcoherencefortheperceivedqualityofsynthesizedvisualspeech
AT lataczlukas ontheimportanceofaudiovisualcoherencefortheperceivedqualityofsynthesizedvisualspeech
AT verhelstwerner ontheimportanceofaudiovisualcoherencefortheperceivedqualityofsynthesizedvisualspeech