Generative modeling of dynamic visual scenes

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.

Bibliographic Details
Main Author: Lin, Dahua, Ph. D. Massachusetts Institute of Technology
Other Authors: John Fisher.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2013
Subjects:
Online Access:http://hdl.handle.net/1721.1/78453
_version_ 1811097686793781248
author Lin, Dahua, Ph. D. Massachusetts Institute of Technology
author2 John Fisher.
author_facet John Fisher.
Lin, Dahua, Ph. D. Massachusetts Institute of Technology
author_sort Lin, Dahua, Ph. D. Massachusetts Institute of Technology
collection MIT
description Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.
first_indexed 2024-09-23T17:03:18Z
format Thesis
id mit-1721.1/78453
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T17:03:18Z
publishDate 2013
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/784532019-04-11T02:43:51Z Generative modeling of dynamic visual scenes Lin, Dahua, Ph. D. Massachusetts Institute of Technology John Fisher. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012. Cataloged from PDF version of thesis. Includes bibliographical references (p. 301-312). Modeling visual scenes is one of the fundamental tasks of computer vision. Whereas tremendous efforts have been devoted to video analysis in past decades, most prior work focuses on specific tasks, leading to dedicated methods to solve them. This PhD thesis instead aims to derive a probabilistic generative model that coherently integrates different aspects, notably appearance, motion, and the interaction between them. Specifically, this model considers each video as a composite of dynamic layers, each associated with a covering domain, an appearance template, and a flow describing its motion. These layers change dynamically following the associated flows, and are combined into video frames according to a Z-order that specifies their relative depth-order. To describe these layers and their dynamic changes, three major components are incorporated: (1) An appearance model describes the generative process of the pixel values of a video layer. This model, via the combination of a probabilistic patch manifold and a conditional Markov random field, is able to express rich local details while maintaining global coherence. (2) A motion model captures the motion pattern of a layer through a new concept called geometric flow that originates from differential geometric analysis. A geometric flow unifies the trajectory-based representation and the notion of geometric transformation to represent the collective dynamic behaviors persisting over time. (3) A partial Z-order specifies the relative depth order between layers. Here, through the unique correspondence between equivalent classes of partial orders and consistent choice functions, a distribution over the spaces of partial orders is established, and inference can thus be performed thereon. The development of these models leads to significant challenges in probabilistic modeling and inference that need new techniques to address. We studied two important problems: (1) Both the appearance model and the motion model rely on mixture modeling to capture complex distributions. In a dynamic setting, the components parameters and the number of components in a mixture model can change over time. While the use of Dirichlet processes (DPs) as priors allows indefinite number of components, incorporating temporal dependencies between DPs remains a nontrivial issue, theoretically and practically. Our research on this problem leads to a new construction of dependent DPs, enabling various forms of dynamic variations for nonparametric mixture models by harnessing the connections between Poisson and Dirichlet processes. (2) The inference of partial Z-order from a video needs a method to sample from the posterior distribution of partial orders. A key challenge here is that the underlying space of partial orders is disconnected, meaning that one may not be able to make local updates without violating the combinatorial constraints for partial orders. We developed a novel sampling method to tackle this problem, which dynamically introduces virtual states as bridges to connect between different parts of the space, implicitly resulting in an ergodic Markov chain over an augmented space. With this generative model of visual scenes, many vision problems can be readily solved through inference performed on the model. Empirical experiments demonstrate that this framework yields promising results on a series of practical tasks, including video denoising and inpainting, collective motion analysis, and semantic scene understanding. by Dahua Lin. Ph.D. 2013-04-12T19:25:20Z 2013-04-12T19:25:20Z 2012 2012 Thesis http://hdl.handle.net/1721.1/78453 832618174 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 312 p. application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Lin, Dahua, Ph. D. Massachusetts Institute of Technology
Generative modeling of dynamic visual scenes
title Generative modeling of dynamic visual scenes
title_full Generative modeling of dynamic visual scenes
title_fullStr Generative modeling of dynamic visual scenes
title_full_unstemmed Generative modeling of dynamic visual scenes
title_short Generative modeling of dynamic visual scenes
title_sort generative modeling of dynamic visual scenes
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/78453
work_keys_str_mv AT lindahuaphdmassachusettsinstituteoftechnology generativemodelingofdynamicvisualscenes