Using the Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization

Recognizing objects in images is an active area of research in computer vision. In the last two decades, there has been much progress and there are already object recognition systems operating in commercial products. However, most of the algorithms for detecting objects perform an exhaustive search...

Full description

Bibliographic Details
Main Authors:	Torralba, Antonio, Murphy, K. P., Freeman, William T.
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	en_US
Published:	Association for Computing Machinery (ACM) 2012
Online Access:	http://hdl.handle.net/1721.1/73074 https://orcid.org/0000-0003-4915-0256

_version_	1811074284675661824
author	Torralba, Antonio Murphy, K. P. Freeman, William T.
author2	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Torralba, Antonio Murphy, K. P. Freeman, William T.
author_sort	Torralba, Antonio
collection	MIT
description	Recognizing objects in images is an active area of research in computer vision. In the last two decades, there has been much progress and there are already object recognition systems operating in commercial products. However, most of the algorithms for detecting objects perform an exhaustive search across all locations and scales in the image comparing local image regions with an object model. That approach ignores the semantic structure of scenes and tries to solve the recognition problem by brute force. In the real world, objects tend to covary with other objects, providing a rich collection of contextual associations. These contextual associations can be used to reduce the search space by looking only in places in which the object is expected to be; this also increases performance, by rejecting patterns that look like the target but appear in unlikely places. Most modeling attempts so far have defined the context of an object in terms of other previously recognized objects. The drawback of this approach is that inferring the context becomes as difficult as detecting each object. An alternative view of context relies on using the entire scene information holistically. This approach is algorithmically attractive since it dispenses with the need for a prior step of individual object recognition. In this paper, we use a probabilistic framework for encoding the relationships between context and object properties and we show how an integrated system provides improved performance. We view this as a significant step toward general purpose machine vision systems.
first_indexed	2024-09-23T09:46:38Z
format	Article
id	mit-1721.1/73074
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T09:46:38Z
publishDate	2012
publisher	Association for Computing Machinery (ACM)
record_format	dspace
spelling	mit-1721.1/730742022-09-26T13:37:26Z Using the Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization Torralba, Antonio Murphy, K. P. Freeman, William T. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Freeman, William T. Torralba, Antonio Recognizing objects in images is an active area of research in computer vision. In the last two decades, there has been much progress and there are already object recognition systems operating in commercial products. However, most of the algorithms for detecting objects perform an exhaustive search across all locations and scales in the image comparing local image regions with an object model. That approach ignores the semantic structure of scenes and tries to solve the recognition problem by brute force. In the real world, objects tend to covary with other objects, providing a rich collection of contextual associations. These contextual associations can be used to reduce the search space by looking only in places in which the object is expected to be; this also increases performance, by rejecting patterns that look like the target but appear in unlikely places. Most modeling attempts so far have defined the context of an object in terms of other previously recognized objects. The drawback of this approach is that inferring the context becomes as difficult as detecting each object. An alternative view of context relies on using the entire scene information holistically. This approach is algorithmically attractive since it dispenses with the need for a prior step of individual object recognition. In this paper, we use a probabilistic framework for encoding the relationships between context and object properties and we show how an integrated system provides improved performance. We view this as a significant step toward general purpose machine vision systems. United States. National Geospatial-Intelligence Agency (NEGI-1582-04-0004) United States. Army Research Office. Multidisciplinary University Research Initiative (Grant Number N00014-06-1-0734) National Science Foundation (U.S.). (Contract IIS-0413232) National Defense Science and Engineering Graduate Fellowship 2012-09-20T17:34:06Z 2012-09-20T17:34:06Z 2010-03 Article http://purl.org/eprint/type/JournalArticle 0001-0782 http://hdl.handle.net/1721.1/73074 A. Torralba, K. P. Murphy, and W. T. Freeman. 2010. Using the forest to see the trees: exploiting context for visual object detection and localization. Communications of the ACM 53, 3 (March 2010), 107-114. https://orcid.org/0000-0003-4915-0256 en_US http://dx.doi.org/10.1145/1666420.1666446 Communications of the ACM Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ application/pdf Association for Computing Machinery (ACM) Other University Web Domain
spellingShingle	Torralba, Antonio Murphy, K. P. Freeman, William T. Using the Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization
title	Using the Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization
title_full	Using the Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization
title_fullStr	Using the Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization
title_full_unstemmed	Using the Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization
title_short	Using the Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization
title_sort	using the forest to see the trees exploiting context for visual object detection and localization
url	http://hdl.handle.net/1721.1/73074 https://orcid.org/0000-0003-4915-0256
work_keys_str_mv	AT torralbaantonio usingtheforesttoseethetreesexploitingcontextforvisualobjectdetectionandlocalization AT murphykp usingtheforesttoseethetreesexploitingcontextforvisualobjectdetectionandlocalization AT freemanwilliamt usingtheforesttoseethetreesexploitingcontextforvisualobjectdetectionandlocalization

Using the Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization

Similar Items