Context-based Visual Feedback Recognition

PhD thesis

Bibliographic Details
Main Author: Morency, Louis-Philippe
Other Authors: Trevor Darrell
Language:en_US
Published: 2006
Online Access:http://hdl.handle.net/1721.1/34893
_version_ 1811096267827183616
author Morency, Louis-Philippe
author2 Trevor Darrell
author_facet Trevor Darrell
Morency, Louis-Philippe
author_sort Morency, Louis-Philippe
collection MIT
description PhD thesis
first_indexed 2024-09-23T16:41:10Z
id mit-1721.1/34893
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T16:41:10Z
publishDate 2006
record_format dspace
spelling mit-1721.1/348932019-04-12T08:37:59Z Context-based Visual Feedback Recognition Morency, Louis-Philippe Trevor Darrell Vision PhD thesis During face-to-face conversation, people use visual feedback (e.g.,head and eye gesture) to communicate relevant information and tosynchronize rhythm between participants. When recognizing visualfeedback, people often rely on more than their visual perception.For instance, knowledge about the current topic and from previousutterances help guide the recognition of nonverbal cues. The goal ofthis thesis is to augment computer interfaces with the ability toperceive visual feedback gestures and to enable the exploitation ofcontextual information from the current interaction state to improvevisual feedback recognition.We introduce the concept of visual feedback anticipationwhere contextual knowledge from an interactive system (e.g. lastspoken utterance from the robot or system events from the GUIinterface) is analyzed online to anticipate visual feedback from ahuman participant and improve visual feedback recognition. Ourmulti-modal framework for context-based visual feedback recognitionwas successfully tested on conversational and non-embodiedinterfaces for head and eye gesture recognition.We also introduce Frame-based Hidden-state Conditional RandomField model, a new discriminative model for visual gesturerecognition which can model the sub-structure of a gesture sequence,learn the dynamics between gesture labels, and can be directlyapplied to label unsegmented sequences. The FHCRF model outperformsprevious approaches (i.e. HMM, SVM and CRF) for visual gesturerecognition and can efficiently learn relevant contextualinformation necessary for visual feedback anticipation.A real-time visual feedback recognition library for interactiveinterfaces (called Watson) was developed to recognize head gaze,head gestures, and eye gaze using the images from a monocular orstereo camera and the context information from the interactivesystem. Watson was downloaded by more then 70 researchers around theworld and was successfully used by MERL, USC, NTT, MIT Media Lab andmany other research groups. 2006-11-17T11:12:55Z 2006-11-17T11:12:55Z 2006-11-15 MIT-CSAIL-TR-2006-075 http://hdl.handle.net/1721.1/34893 en_US Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory 195 p. 5912220 bytes 20231190 bytes application/pdf application/postscript application/pdf application/postscript
spellingShingle Morency, Louis-Philippe
Context-based Visual Feedback Recognition
title Context-based Visual Feedback Recognition
title_full Context-based Visual Feedback Recognition
title_fullStr Context-based Visual Feedback Recognition
title_full_unstemmed Context-based Visual Feedback Recognition
title_short Context-based Visual Feedback Recognition
title_sort context based visual feedback recognition
url http://hdl.handle.net/1721.1/34893
work_keys_str_mv AT morencylouisphilippe contextbasedvisualfeedbackrecognition