Infrastructure development for integration of lip reading into the SUMMIT Speech Recognizer

Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.

Bibliographic Details
Main Author:	La, Chia-Hao, 1980-
Other Authors:	Timothy J. Hazen.
Format:	Thesis
Language:	eng
Published:	Massachusetts Institute of Technology 2006
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://hdl.handle.net/1721.1/29670

_version_	1826201951015010304
author	La, Chia-Hao, 1980-
author2	Timothy J. Hazen.
author_facet	Timothy J. Hazen. La, Chia-Hao, 1980-
author_sort	La, Chia-Hao, 1980-
collection	MIT
description	Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.
first_indexed	2024-09-23T11:59:33Z
format	Thesis
id	mit-1721.1/29670
institution	Massachusetts Institute of Technology
language	eng
last_indexed	2024-09-23T11:59:33Z
publishDate	2006
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/296702019-04-10T15:04:28Z Infrastructure development for integration of lip reading into the SUMMIT Speech Recognizer La, Chia-Hao, 1980- Timothy J. Hazen. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003. Includes bibliographical references (p. 51-52). This thesis describes a method for augmenting an audio-only speech recognizer with visual lip-reading information, in order to improve the performance and robustness of the recognizer. The speech recognizer's variable length audio segments are resolved with the fixed length video frames using segment constrained Hidden Markov Modeling. A Viterbi search over the per-segment Hidden Markov Model resolves the variable asynchrony between the audio and video streams. The two streams are combined according to a relative weighting scheme, which is determined by optimizing on a held-out data set. Although a full audio-visual system has yet not been implemented, this thesis describes the infrastructure that has been developed to accommodate integration with a visual lip-reading module that will be completed in the near future. by Chia-Hao La. M.Eng. 2006-03-24T16:13:35Z 2006-03-24T16:13:35Z 2003 2003 Thesis http://hdl.handle.net/1721.1/29670 53833510 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 52 p. 1862592 bytes 1862400 bytes application/pdf application/pdf application/pdf Massachusetts Institute of Technology
spellingShingle	Electrical Engineering and Computer Science. La, Chia-Hao, 1980- Infrastructure development for integration of lip reading into the SUMMIT Speech Recognizer
title	Infrastructure development for integration of lip reading into the SUMMIT Speech Recognizer
title_full	Infrastructure development for integration of lip reading into the SUMMIT Speech Recognizer
title_fullStr	Infrastructure development for integration of lip reading into the SUMMIT Speech Recognizer
title_full_unstemmed	Infrastructure development for integration of lip reading into the SUMMIT Speech Recognizer
title_short	Infrastructure development for integration of lip reading into the SUMMIT Speech Recognizer
title_sort	infrastructure development for integration of lip reading into the summit speech recognizer
topic	Electrical Engineering and Computer Science.
url	http://hdl.handle.net/1721.1/29670
work_keys_str_mv	AT lachiahao1980 infrastructuredevelopmentforintegrationoflipreadingintothesummitspeechrecognizer

Infrastructure development for integration of lip reading into the SUMMIT Speech Recognizer

Similar Items