Manhattan Scene Understanding Using Monocular, Stereo, and 3D Features

This paper addresses scene understanding in the context of a moving camera, integrating semantic reasoning ideas from monocular vision with 3D information available through structure-from-motion. We combine geometric and photometric cues in a Bayesian framework, building on recent successes leveragi...

Full description

Bibliographic Details
Main Authors: Flint, A, Murray, D, Reid, I, IEEE
Format: Conference item
Published: 2011
_version_ 1797055682991620096
author Flint, A
Murray, D
Reid, I
IEEE
author_facet Flint, A
Murray, D
Reid, I
IEEE
author_sort Flint, A
collection OXFORD
description This paper addresses scene understanding in the context of a moving camera, integrating semantic reasoning ideas from monocular vision with 3D information available through structure-from-motion. We combine geometric and photometric cues in a Bayesian framework, building on recent successes leveraging the indoor Manhattan assumption in monocular vision. We focus on indoor environments and show how to extract key boundaries while ignoring clutter and decorations. To achieve this we present a graphical model that relates photometric cues learned from labeled data, stereo photo-consistency across multiple views, and depth cues derived from structure-from-motion point clouds. We show how to solve MAP inference using dynamic programming, allowing exact, global inference in ∼100 ms (in addition to feature computation of under one second) without using specialized hardware. Experiments show our system out-performing the state-of-the-art. © 2011 IEEE.
first_indexed 2024-03-06T19:13:17Z
format Conference item
id oxford-uuid:17826ba5-023f-4ec1-9a8a-fc00193a0988
institution University of Oxford
last_indexed 2024-03-06T19:13:17Z
publishDate 2011
record_format dspace
spelling oxford-uuid:17826ba5-023f-4ec1-9a8a-fc00193a09882022-03-26T10:37:36ZManhattan Scene Understanding Using Monocular, Stereo, and 3D FeaturesConference itemhttp://purl.org/coar/resource_type/c_5794uuid:17826ba5-023f-4ec1-9a8a-fc00193a0988Symplectic Elements at Oxford2011Flint, AMurray, DReid, IIEEEThis paper addresses scene understanding in the context of a moving camera, integrating semantic reasoning ideas from monocular vision with 3D information available through structure-from-motion. We combine geometric and photometric cues in a Bayesian framework, building on recent successes leveraging the indoor Manhattan assumption in monocular vision. We focus on indoor environments and show how to extract key boundaries while ignoring clutter and decorations. To achieve this we present a graphical model that relates photometric cues learned from labeled data, stereo photo-consistency across multiple views, and depth cues derived from structure-from-motion point clouds. We show how to solve MAP inference using dynamic programming, allowing exact, global inference in ∼100 ms (in addition to feature computation of under one second) without using specialized hardware. Experiments show our system out-performing the state-of-the-art. © 2011 IEEE.
spellingShingle Flint, A
Murray, D
Reid, I
IEEE
Manhattan Scene Understanding Using Monocular, Stereo, and 3D Features
title Manhattan Scene Understanding Using Monocular, Stereo, and 3D Features
title_full Manhattan Scene Understanding Using Monocular, Stereo, and 3D Features
title_fullStr Manhattan Scene Understanding Using Monocular, Stereo, and 3D Features
title_full_unstemmed Manhattan Scene Understanding Using Monocular, Stereo, and 3D Features
title_short Manhattan Scene Understanding Using Monocular, Stereo, and 3D Features
title_sort manhattan scene understanding using monocular stereo and 3d features
work_keys_str_mv AT flinta manhattansceneunderstandingusingmonocularstereoand3dfeatures
AT murrayd manhattansceneunderstandingusingmonocularstereoand3dfeatures
AT reidi manhattansceneunderstandingusingmonocularstereoand3dfeatures
AT ieee manhattansceneunderstandingusingmonocularstereoand3dfeatures