Perceptual Evaluation of Video-Realistic Speech

abstract With many visual speech animation techniques now available, there is a clear need for systematic perceptual evaluation schemes. We describe here our scheme and its application to a new video-realistic (potentially indistinguishable from real recorded video) visual-speech animation sys...

Full description

Bibliographic Details
Main Authors:	Geiger, Gadi, Ezzat, Tony, Poggio, Tomaso
Language:	en_US
Published:	2004
Subjects:	AI visual speech speech animation face animation image morphing lip reading
Online Access:	http://hdl.handle.net/1721.1/7275

_version_	1826192781741129728
author	Geiger, Gadi Ezzat, Tony Poggio, Tomaso
author_facet	Geiger, Gadi Ezzat, Tony Poggio, Tomaso
author_sort	Geiger, Gadi
collection	MIT
description	abstract With many visual speech animation techniques now available, there is a clear need for systematic perceptual evaluation schemes. We describe here our scheme and its application to a new video-realistic (potentially indistinguishable from real recorded video) visual-speech animation system, called Mary 101. Two types of experiments were performed: a) distinguishing visually between real and synthetic image- sequences of the same utterances, ("Turing tests") and b) gauging visual speech recognition by comparing lip-reading performance of the real and synthetic image-sequences of the same utterances ("Intelligibility tests"). Subjects that were presented randomly with either real or synthetic image-sequences could not tell the synthetic from the real sequences above chance level. The same subjects when asked to lip-read the utterances from the same image-sequences recognized speech from real image-sequences significantly better than from synthetic ones. However, performance for both, real and synthetic, were at levels suggested in the literature on lip-reading. We conclude from the two experiments that the animation of Mary 101 is adequate for providing a percept of a talking head. However, additional effort is required to improve the animation for lip-reading purposes like rehabilitation and language learning. In addition, these two tasks could be considered as explicit and implicit perceptual discrimination tasks. In the explicit task (a), each stimulus is classified directly as a synthetic or real image-sequence by detecting a possible difference between the synthetic and the real image-sequences. The implicit perceptual discrimination task (b) consists of a comparison between visual recognition of speech of real and synthetic image-sequences. Our results suggest that implicit perceptual discrimination is a more sensitive method for discrimination between synthetic and real image-sequences than explicit perceptual discrimination.
first_indexed	2024-09-23T09:28:46Z
id	mit-1721.1/7275
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T09:28:46Z
publishDate	2004
record_format	dspace
spelling	mit-1721.1/72752019-04-12T08:34:36Z Perceptual Evaluation of Video-Realistic Speech Geiger, Gadi Ezzat, Tony Poggio, Tomaso AI visual speech speech animation face animation image morphing lip reading abstract With many visual speech animation techniques now available, there is a clear need for systematic perceptual evaluation schemes. We describe here our scheme and its application to a new video-realistic (potentially indistinguishable from real recorded video) visual-speech animation system, called Mary 101. Two types of experiments were performed: a) distinguishing visually between real and synthetic image- sequences of the same utterances, ("Turing tests") and b) gauging visual speech recognition by comparing lip-reading performance of the real and synthetic image-sequences of the same utterances ("Intelligibility tests"). Subjects that were presented randomly with either real or synthetic image-sequences could not tell the synthetic from the real sequences above chance level. The same subjects when asked to lip-read the utterances from the same image-sequences recognized speech from real image-sequences significantly better than from synthetic ones. However, performance for both, real and synthetic, were at levels suggested in the literature on lip-reading. We conclude from the two experiments that the animation of Mary 101 is adequate for providing a percept of a talking head. However, additional effort is required to improve the animation for lip-reading purposes like rehabilitation and language learning. In addition, these two tasks could be considered as explicit and implicit perceptual discrimination tasks. In the explicit task (a), each stimulus is classified directly as a synthetic or real image-sequence by detecting a possible difference between the synthetic and the real image-sequences. The implicit perceptual discrimination task (b) consists of a comparison between visual recognition of speech of real and synthetic image-sequences. Our results suggest that implicit perceptual discrimination is a more sensitive method for discrimination between synthetic and real image-sequences than explicit perceptual discrimination. 2004-10-20T21:05:09Z 2004-10-20T21:05:09Z 2003-02-28 AIM-2003-003 CBCL-224 http://hdl.handle.net/1721.1/7275 en_US AIM-2003-003 CBCL-224 17 p. 1515741 bytes 1358361 bytes application/postscript application/pdf application/postscript application/pdf
spellingShingle	AI visual speech speech animation face animation image morphing lip reading Geiger, Gadi Ezzat, Tony Poggio, Tomaso Perceptual Evaluation of Video-Realistic Speech
title	Perceptual Evaluation of Video-Realistic Speech
title_full	Perceptual Evaluation of Video-Realistic Speech
title_fullStr	Perceptual Evaluation of Video-Realistic Speech
title_full_unstemmed	Perceptual Evaluation of Video-Realistic Speech
title_short	Perceptual Evaluation of Video-Realistic Speech
title_sort	perceptual evaluation of video realistic speech
topic	AI visual speech speech animation face animation image morphing lip reading
url	http://hdl.handle.net/1721.1/7275
work_keys_str_mv	AT geigergadi perceptualevaluationofvideorealisticspeech AT ezzattony perceptualevaluationofvideorealisticspeech AT poggiotomaso perceptualevaluationofvideorealisticspeech

Perceptual Evaluation of Video-Realistic Speech

Similar Items