Learning to Exploit Stability for 3D Scene Parsing

© 2018 Curran Associates Inc.All rights reserved. Human scene understanding uses a variety of visual and non-visual cues to perform inference on object types, poses, and relations. Physics is a rich and universal cue that we exploit to enhance scene understanding. In this paper, we integrate the phy...

Full description

Bibliographic Details
Main Authors:	Du, Yilun, Liu, Zhijian, Basevi, Hector, Leonardis, Aleš, Freenman, William T., Tenenbaum, Joshua B., Wu, Jiajun
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format:	Article
Language:	English
Published:	2021
Online Access:	https://hdl.handle.net/1721.1/137464

_version_	1811074080504283136
author	Du, Yilun Liu, Zhijian Basevi, Hector Leonardis, Aleš Freenman, William T. Tenenbaum, Joshua B. Wu, Jiajun
author2	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Du, Yilun Liu, Zhijian Basevi, Hector Leonardis, Aleš Freenman, William T. Tenenbaum, Joshua B. Wu, Jiajun
author_sort	Du, Yilun
collection	MIT
description	© 2018 Curran Associates Inc.All rights reserved. Human scene understanding uses a variety of visual and non-visual cues to perform inference on object types, poses, and relations. Physics is a rich and universal cue that we exploit to enhance scene understanding. In this paper, we integrate the physical cue of stability into the learning process by looping in a physics engine into bottom-up recognition models, and apply it to the problem of 3D scene parsing. We first show that applying physics supervision to an existing scene understanding model increases performance, produces more stable predictions, and allows training to an equivalent performance level with fewer annotated training examples. We then present a novel architecture for 3D scene parsing named Prim R-CNN, learning to predict bounding boxes as well as their 3D size, translation, and rotation. With physics supervision, Prim R-CNN outperforms existing scene understanding approaches on this problem. Finally, we show that finetuning with physics supervision on unlabeled real images improves real domain transfer of models training on synthetic data.
first_indexed	2024-09-23T09:42:51Z
format	Article
id	mit-1721.1/137464
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T09:42:51Z
publishDate	2021
record_format	dspace
spelling	mit-1721.1/1374642022-09-30T16:24:16Z Learning to Exploit Stability for 3D Scene Parsing Du, Yilun Liu, Zhijian Basevi, Hector Leonardis, Aleš Freenman, William T. Tenenbaum, Joshua B. Wu, Jiajun Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory © 2018 Curran Associates Inc.All rights reserved. Human scene understanding uses a variety of visual and non-visual cues to perform inference on object types, poses, and relations. Physics is a rich and universal cue that we exploit to enhance scene understanding. In this paper, we integrate the physical cue of stability into the learning process by looping in a physics engine into bottom-up recognition models, and apply it to the problem of 3D scene parsing. We first show that applying physics supervision to an existing scene understanding model increases performance, produces more stable predictions, and allows training to an equivalent performance level with fewer annotated training examples. We then present a novel architecture for 3D scene parsing named Prim R-CNN, learning to predict bounding boxes as well as their 3D size, translation, and rotation. With physics supervision, Prim R-CNN outperforms existing scene understanding approaches on this problem. Finally, we show that finetuning with physics supervision on unlabeled real images improves real domain transfer of models training on synthetic data. 2021-11-05T14:03:59Z 2021-11-05T14:03:59Z 2018 2019-05-28T12:38:26Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/137464 Du, Yilun, Liu, Zhijian, Basevi, Hector, Leonardis, Aleš, Freenman, William T. et al. 2018. "Learning to Exploit Stability for 3D Scene Parsing." en https://papers.nips.cc/paper/7444-learning-to-exploit-stability-for-3d-scene-parsing Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Neural Information Processing Systems (NIPS)
spellingShingle	Du, Yilun Liu, Zhijian Basevi, Hector Leonardis, Aleš Freenman, William T. Tenenbaum, Joshua B. Wu, Jiajun Learning to Exploit Stability for 3D Scene Parsing
title	Learning to Exploit Stability for 3D Scene Parsing
title_full	Learning to Exploit Stability for 3D Scene Parsing
title_fullStr	Learning to Exploit Stability for 3D Scene Parsing
title_full_unstemmed	Learning to Exploit Stability for 3D Scene Parsing
title_short	Learning to Exploit Stability for 3D Scene Parsing
title_sort	learning to exploit stability for 3d scene parsing
url	https://hdl.handle.net/1721.1/137464
work_keys_str_mv	AT duyilun learningtoexploitstabilityfor3dsceneparsing AT liuzhijian learningtoexploitstabilityfor3dsceneparsing AT basevihector learningtoexploitstabilityfor3dsceneparsing AT leonardisales learningtoexploitstabilityfor3dsceneparsing AT freenmanwilliamt learningtoexploitstabilityfor3dsceneparsing AT tenenbaumjoshuab learningtoexploitstabilityfor3dsceneparsing AT wujiajun learningtoexploitstabilityfor3dsceneparsing

Learning to Exploit Stability for 3D Scene Parsing

Similar Items