Grounding language in events
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | eng |
Published: |
Massachusetts Institute of Technology
2009
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/46548 |
_version_ | 1826196897526710272 |
---|---|
author | Fleischman, Michael Ben |
author2 | Deb Roy. |
author_facet | Deb Roy. Fleischman, Michael Ben |
author_sort | Fleischman, Michael Ben |
collection | MIT |
description | Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008. |
first_indexed | 2024-09-23T10:39:41Z |
format | Thesis |
id | mit-1721.1/46548 |
institution | Massachusetts Institute of Technology |
language | eng |
last_indexed | 2024-09-23T10:39:41Z |
publishDate | 2009 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/465482019-04-10T15:17:21Z Grounding language in events Fleischman, Michael Ben Deb Roy. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008. Includes bibliographical references (p. 137-142). Broadcast video and virtual environments are just two of the growing number of domains in which language is embedded in multiple modalities of rich non-linguistic information. Applications for such multimodal domains are often based on traditional natural language processing techniques that ignore the connection between words and the non-linguistic context in which they are used. This thesis describes a methodology for representing these connections in models which ground the meaning of words in representations of events. Incorporating these grounded language models with text-based techniques significantly improves the performance of three multimodal applications: natural language understanding in videogames, sports video search and automatic speech recognition. Two approaches to representing the structure of events are presented and used to model the meaning of words. In the domain of virtual game worlds, a hand-designed hierarchical behavior grammar is used to explicitly represent all the various actions that an agent can take in a virtual world. This grammar is used to interpret events by parsing sequences of observed actions in order to generate hierarchical event structures. In the noisier and more open -ended domain of broadcast sports video, hierarchical temporal patterns are automatically mined from large corpora of unlabeled video data. The structure of events in video is represented by vectors of these hierarchical patterns. (cont.) Grounded language models are encoded using Hierarchical Bayesian models to represent the probability of words given elements of these event structures. These grounded language models are used to incorporate non-linguistic information into text-based approaches to multimodal applications. In the virtual game domain, this non-linguistic information improves natural language understanding for a virtual agent by nearly 10% and cuts in half the negative effects of noise caused by automatic speech recognition. For broadcast video of baseball and American football, video search systems that incorporate grounded language models are shown to perform up to 33% better than text-based systems. Further, systems for recognizing speech in baseball video that use grounded language models show 25% greater word accuracy than traditional systems. by Michael Ben Fleischman. Ph.D. 2009-08-26T16:48:27Z 2009-08-26T16:48:27Z 2008 2008 Thesis http://hdl.handle.net/1721.1/46548 418279066 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 142 p. application/pdf Massachusetts Institute of Technology |
spellingShingle | Electrical Engineering and Computer Science. Fleischman, Michael Ben Grounding language in events |
title | Grounding language in events |
title_full | Grounding language in events |
title_fullStr | Grounding language in events |
title_full_unstemmed | Grounding language in events |
title_short | Grounding language in events |
title_sort | grounding language in events |
topic | Electrical Engineering and Computer Science. |
url | http://hdl.handle.net/1721.1/46548 |
work_keys_str_mv | AT fleischmanmichaelben groundinglanguageinevents |