Labeling, discovering, and detecting objects in images
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | eng |
Published: |
Massachusetts Institute of Technology
2008
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/43057 |
_version_ | 1826202583001202688 |
---|---|
author | Russell, Bryan Christopher, 1979- |
author2 | William T. Freeman. |
author_facet | William T. Freeman. Russell, Bryan Christopher, 1979- |
author_sort | Russell, Bryan Christopher, 1979- |
collection | MIT |
description | Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008. |
first_indexed | 2024-09-23T12:09:56Z |
format | Thesis |
id | mit-1721.1/43057 |
institution | Massachusetts Institute of Technology |
language | eng |
last_indexed | 2024-09-23T12:09:56Z |
publishDate | 2008 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/430572019-04-12T09:25:38Z Labeling, discovering, and detecting objects in images Russell, Bryan Christopher, 1979- William T. Freeman. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008. Includes bibliographical references (p. 131-138). Recognizing the many objects that comprise our visual world is a difficult task. Confounding factors, such as intra-class object variation, clutter, pose, lighting, dealing with never-before seen objects, scale, and lack of visual experience often fool existing recognition systems. In this thesis, we explore three issues that address a few of these factors: the importance of labeled image databases for recognition, the ability to discover object categories from simply looking at many images, and the use of large labeled image databases to efficiently detect objects embedded in scenes. For each of the issues above, we will need to cope with large collections of images. We begin by introducing LabelMe, a large labeled image database collected from users via a web annotation tool. The users of the annotation tool provided information about the identity, location, and extent of objects in images. Through this effort, we have collected about 160,000 images and 200,000 object labels to date. We show that the database spans more object categories and scenes and offers a wider range of appearance variation than most other labeled databases for object recognition. We also provide four useful extensions of the database: (i) resolving synonym ambiguities that arise in the object labels, (ii) recovering object-part relationships, (iii) extracting a depth ordering of the labeled objects in an image, and (iv) providing a semi-automatic process for the fast labeling of images. We then seek to learn models of objects in the extreme case when no supervision is provided. We draw inspiration from the success of unsupervised topic discovery in text. We apply the Latent Dirichlet Allocation model of Blei et al. to unlabeled images to automatically discover object categories. To achieve this, we employ the visual words representation of images, which is analogous to the words in text. (cont) We show that our unsupervised model achieves comparable classification performance to a model trained with supervision on an unseen image set depicting several object classes. We also successfully localize the discovered object classes in images. While the image representation used for the object discovery process is simple to compute and can distinguish between different object categories, it does not capture explicit spatial information about regions in different parts of the image. We describe a procedure for combining image segmentation with the object discovery process to by Bryan Christopher Russell. Ph.D. 2008-11-07T18:57:14Z 2008-11-07T18:57:14Z 2008 2008 Thesis http://hdl.handle.net/1721.1/43057 243866111 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 138 p. application/pdf Massachusetts Institute of Technology |
spellingShingle | Electrical Engineering and Computer Science. Russell, Bryan Christopher, 1979- Labeling, discovering, and detecting objects in images |
title | Labeling, discovering, and detecting objects in images |
title_full | Labeling, discovering, and detecting objects in images |
title_fullStr | Labeling, discovering, and detecting objects in images |
title_full_unstemmed | Labeling, discovering, and detecting objects in images |
title_short | Labeling, discovering, and detecting objects in images |
title_sort | labeling discovering and detecting objects in images |
topic | Electrical Engineering and Computer Science. |
url | http://hdl.handle.net/1721.1/43057 |
work_keys_str_mv | AT russellbryanchristopher1979 labelingdiscoveringanddetectingobjectsinimages |