Graphical models for visual object recognition and tracking

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.

Detalhes bibliográficos
Autor principal:	Sudderth, Erik B. (Erik Blaine), 1977-
Outros Autores:	William T. Freeman and Alan S. Willsky.
Formato:	Tese
Idioma:	eng
Publicado em:	Massachusetts Institute of Technology 2006
Assuntos:	Electrical Engineering and Computer Science.
Acesso em linha:	http://hdl.handle.net/1721.1/34023

_version_	1826203403730026496
author	Sudderth, Erik B. (Erik Blaine), 1977-
author2	William T. Freeman and Alan S. Willsky.
author_facet	William T. Freeman and Alan S. Willsky. Sudderth, Erik B. (Erik Blaine), 1977-
author_sort	Sudderth, Erik B. (Erik Blaine), 1977-
collection	MIT
description	Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.
first_indexed	2024-09-23T12:36:25Z
format	Thesis
id	mit-1721.1/34023
institution	Massachusetts Institute of Technology
language	eng
last_indexed	2024-09-23T12:36:25Z
publishDate	2006
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/340232019-04-10T09:38:59Z Graphical models for visual object recognition and tracking Sudderth, Erik B. (Erik Blaine), 1977- William T. Freeman and Alan S. Willsky. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Includes bibliographical references (p. 277-301). We develop statistical methods which allow effective visual detection, categorization, and tracking of objects in complex scenes. Such computer vision systems must be robust to wide variations in object appearance, the often small size of training databases, and ambiguities induced by articulated or partially occluded objects. Graphical models provide a powerful framework for encoding the statistical structure of visual scenes, and developing corresponding learning and inference algorithms. In this thesis, we describe several models which integrate graphical representations with nonparametric statistical methods. This approach leads to inference algorithms which tractably recover high-dimensional, continuous object pose variations, and learning procedures which transfer knowledge among related recognition tasks. Motivated by visual tracking problems, we first develop a nonparametric extension of the belief propagation (BP) algorithm. Using Monte Carlo methods, we provide general procedures for recursively updating particle-based approximations of continuous sufficient statistics. Efficient multiscale sampling methods then allow this nonparametric BP algorithm to be flexibly adapted to many different applications. (cont.) As a particular example, we consider a graphical model describing the hand's three-dimensional (3D) structure, kinematics, and dynamics. This graph encodes global hand pose via the 3D position and orientation of several rigid components, and thus exposes local structure in a high-dimensional articulated model. Applying nonparametric BP, we recover a hand tracking algorithm which is robust to outliers and local visual ambiguities. Via a set of latent occupancy masks, we also extend our approach to consistently infer occlusion events in a distributed fashion. In the second half of this thesis, we develop methods for learning hierarchical models of objects, the parts composing them, and the scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves accuracy when learning from few examples. (cont.) Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. Adapting these transformed Dirichlet processes to images taken with a binocular stereo camera, we learn integrated, 3D models of object geometry and appearance. This leads to a Monte Carlo algorithm which automatically infers 3D scene structure from the predictable geometry of known object categories. by Erik B. Sudderth. Ph.D. 2006-09-28T14:51:34Z 2006-09-28T14:51:34Z 2006 2006 Thesis http://hdl.handle.net/1721.1/34023 71316087 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 301 p. 19955509 bytes 19954958 bytes application/pdf application/pdf application/pdf Massachusetts Institute of Technology
spellingShingle	Electrical Engineering and Computer Science. Sudderth, Erik B. (Erik Blaine), 1977- Graphical models for visual object recognition and tracking
title	Graphical models for visual object recognition and tracking
title_full	Graphical models for visual object recognition and tracking
title_fullStr	Graphical models for visual object recognition and tracking
title_full_unstemmed	Graphical models for visual object recognition and tracking
title_short	Graphical models for visual object recognition and tracking
title_sort	graphical models for visual object recognition and tracking
topic	Electrical Engineering and Computer Science.
url	http://hdl.handle.net/1721.1/34023
work_keys_str_mv	AT suddertherikberikblaine1977 graphicalmodelsforvisualobjectrecognitionandtracking

Graphical models for visual object recognition and tracking

Registros relacionados