Visual Speech Synthesis by Morphing Visemes

We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audiovisual speech stream. MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes. The visemes are acquired from a recorded visual corpus of a human subjec...

Full description

Bibliographic Details
Main Authors:	Ezzat, Tony, Poggio, Tomaso
Language:	en_US
Published:	2004
Online Access:	http://hdl.handle.net/1721.1/7263

_version_	1811077714831998976
author	Ezzat, Tony Poggio, Tomaso
author_facet	Ezzat, Tony Poggio, Tomaso
author_sort	Ezzat, Tony
collection	MIT
description	We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audiovisual speech stream. MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes. The visemes are acquired from a recorded visual corpus of a human subject which is specifically designed to elicit one instantiation of each viseme. Using optical flow methods, correspondence from every viseme to every other viseme is computed automatically. By morphing along this correspondence, a smooth transition between viseme images may be generated. A complete visual utterance is constructed by concatenating viseme transitions. Finally, phoneme and timing information extracted from a text-to-speech synthesizer is exploited to determine which viseme transitions to use, and the rate at which the morphing process should occur. In this manner, we are able to synchronize the visual speech stream with the audio speech stream, and hence give the impression of a photorealistic talking face.
first_indexed	2024-09-23T10:47:25Z
id	mit-1721.1/7263
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T10:47:25Z
publishDate	2004
record_format	dspace
spelling	mit-1721.1/72632019-04-15T00:40:25Z Visual Speech Synthesis by Morphing Visemes Ezzat, Tony Poggio, Tomaso We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audiovisual speech stream. MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes. The visemes are acquired from a recorded visual corpus of a human subject which is specifically designed to elicit one instantiation of each viseme. Using optical flow methods, correspondence from every viseme to every other viseme is computed automatically. By morphing along this correspondence, a smooth transition between viseme images may be generated. A complete visual utterance is constructed by concatenating viseme transitions. Finally, phoneme and timing information extracted from a text-to-speech synthesizer is exploited to determine which viseme transitions to use, and the rate at which the morphing process should occur. In this manner, we are able to synchronize the visual speech stream with the audio speech stream, and hence give the impression of a photorealistic talking face. 2004-10-20T21:04:35Z 2004-10-20T21:04:35Z 1999-05-01 AIM-1658 CBCL-173 http://hdl.handle.net/1721.1/7263 en_US AIM-1658 CBCL-173 5662753 bytes 1408669 bytes application/postscript application/pdf application/postscript application/pdf
spellingShingle	Ezzat, Tony Poggio, Tomaso Visual Speech Synthesis by Morphing Visemes
title	Visual Speech Synthesis by Morphing Visemes
title_full	Visual Speech Synthesis by Morphing Visemes
title_fullStr	Visual Speech Synthesis by Morphing Visemes
title_full_unstemmed	Visual Speech Synthesis by Morphing Visemes
title_short	Visual Speech Synthesis by Morphing Visemes
title_sort	visual speech synthesis by morphing visemes
url	http://hdl.handle.net/1721.1/7263
work_keys_str_mv	AT ezzattony visualspeechsynthesisbymorphingvisemes AT poggiotomaso visualspeechsynthesisbymorphingvisemes

Visual Speech Synthesis by Morphing Visemes

Similar Items