Spontaneous speech recognition using visual context-aware language models

Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2003.

Bibliographic Details
Main Author:	Mukherjee, Niloy, 1978-
Other Authors:	Deb K. Roy.
Format:	Thesis
Language:	eng
Published:	Massachusetts Institute of Technology 2011
Subjects:	Architecture. Program In Media Arts and Sciences.
Online Access:	http://hdl.handle.net/1721.1/62380

_version_	1811079830674866176
author	Mukherjee, Niloy, 1978-
author2	Deb K. Roy.
author_facet	Deb K. Roy. Mukherjee, Niloy, 1978-
author_sort	Mukherjee, Niloy, 1978-
collection	MIT
description	Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2003.
first_indexed	2024-09-23T11:21:11Z
format	Thesis
id	mit-1721.1/62380
institution	Massachusetts Institute of Technology
language	eng
last_indexed	2024-09-23T11:21:11Z
publishDate	2011
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/623802019-04-11T01:07:40Z Spontaneous speech recognition using visual context-aware language models Mukherjee, Niloy, 1978- Deb K. Roy. Massachusetts Institute of Technology. Dept. of Architecture. Program In Media Arts and Sciences. Massachusetts Institute of Technology. Dept. of Architecture. Program In Media Arts and Sciences. Architecture. Program In Media Arts and Sciences. Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2003. Includes bibliographical references (p. 83-88). The thesis presents a novel situationally-aware multimodal spoken language system called Fuse that performs speech understanding for visual object selection. An experimental task was created in which people were asked to refer, using speech alone, to objects arranged on a table top. During training, Fuse acquires a grammar and vocabulary from a "show-and-tell" procedure in which visual scenes are paired with verbal descriptions of individual objects. Fuse determines a set of visually salient words and phrases and associates them to a set of visual features. Given a new scene, Fuse uses the acquired knowledge to generate class-based language models conditioned on the objects present in the scene as well as a spatial language model that predicts the occurrences of spatial terms conditioned on target and landmark objects. The speech recognizer in Fuse uses a weighted mixture of these language models to search for more likely interpretations of user speech in context of the current scene. During decoding, the weights are updated using a visual attention model which redistributes attention over objects based on partially decoded utterances. The dynamic situationally-aware language models enable Fuse to jointly infer spoken language utterances underlying speech signals as well as the identities of target objects they refer to. In an evaluation of the system, visual situationally-aware language modeling shows significant , more than 30 %, decrease in speech recognition and understanding error rates. The underlying ideas of situation-aware speech understanding that have been developed in Fuse may may be applied in numerous areas including assistive and mobile human-machine interfaces. by Niloy Mukherjee. S.M. 2011-04-25T15:49:45Z 2011-04-25T15:49:45Z 2003 2003 Thesis http://hdl.handle.net/1721.1/62380 54698754 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 88 p. application/pdf Massachusetts Institute of Technology
spellingShingle	Architecture. Program In Media Arts and Sciences. Mukherjee, Niloy, 1978- Spontaneous speech recognition using visual context-aware language models
title	Spontaneous speech recognition using visual context-aware language models
title_full	Spontaneous speech recognition using visual context-aware language models
title_fullStr	Spontaneous speech recognition using visual context-aware language models
title_full_unstemmed	Spontaneous speech recognition using visual context-aware language models
title_short	Spontaneous speech recognition using visual context-aware language models
title_sort	spontaneous speech recognition using visual context aware language models
topic	Architecture. Program In Media Arts and Sciences.
url	http://hdl.handle.net/1721.1/62380
work_keys_str_mv	AT mukherjeeniloy1978 spontaneousspeechrecognitionusingvisualcontextawarelanguagemodels

Spontaneous speech recognition using visual context-aware language models

Similar Items