Perspective Plane Program Induction From a Single Image

© 2020 IEEE. We study the inverse graphics problem of inferring a holistic representation for natural images. Given an input image, our goal is to induce a neuro-symbolic, program-like representation that jointly models camera poses, object locations, and global scene structures. Such high-level, ho...

Full description

Bibliographic Details
Main Authors: Li, Yikai, Mao, Jiayuan, Zhang, Xiuming, Freeman, William T, Tenenbaum, Joshua B, Wu, Jiajun
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers (IEEE) 2021
Online Access:https://hdl.handle.net/1721.1/138366
_version_ 1826194818983788544
author Li, Yikai
Mao, Jiayuan
Zhang, Xiuming
Freeman, William T
Tenenbaum, Joshua B
Wu, Jiajun
author_facet Li, Yikai
Mao, Jiayuan
Zhang, Xiuming
Freeman, William T
Tenenbaum, Joshua B
Wu, Jiajun
author_sort Li, Yikai
collection MIT
description © 2020 IEEE. We study the inverse graphics problem of inferring a holistic representation for natural images. Given an input image, our goal is to induce a neuro-symbolic, program-like representation that jointly models camera poses, object locations, and global scene structures. Such high-level, holistic scene representations further facilitate low-level image manipulation tasks such as inpainting. We formulate this problem as jointly finding the camera pose and scene structure that best describe the input image. The benefits of such joint inference are two-fold: scene regularity serves as a new cue for perspective correction, and in turn, correct perspective correction leads to a simplified scene structure, similar to how the correct shape leads to the most regular texture in shape from texture. Our proposed framework, Perspective Plane Program Induction (P3I), combines search-based and gradient-based algorithms to efficiently solve the problem. P3I outperforms a set of baselines on a collection of Internet images, across tasks including camera pose estimation, global structure inference, and down-stream image manipulation tasks.
first_indexed 2024-09-23T10:02:39Z
format Article
id mit-1721.1/138366
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T10:02:39Z
publishDate 2021
publisher Institute of Electrical and Electronics Engineers (IEEE)
record_format dspace
spelling mit-1721.1/1383662021-12-08T03:30:13Z Perspective Plane Program Induction From a Single Image Li, Yikai Mao, Jiayuan Zhang, Xiuming Freeman, William T Tenenbaum, Joshua B Wu, Jiajun © 2020 IEEE. We study the inverse graphics problem of inferring a holistic representation for natural images. Given an input image, our goal is to induce a neuro-symbolic, program-like representation that jointly models camera poses, object locations, and global scene structures. Such high-level, holistic scene representations further facilitate low-level image manipulation tasks such as inpainting. We formulate this problem as jointly finding the camera pose and scene structure that best describe the input image. The benefits of such joint inference are two-fold: scene regularity serves as a new cue for perspective correction, and in turn, correct perspective correction leads to a simplified scene structure, similar to how the correct shape leads to the most regular texture in shape from texture. Our proposed framework, Perspective Plane Program Induction (P3I), combines search-based and gradient-based algorithms to efficiently solve the problem. P3I outperforms a set of baselines on a collection of Internet images, across tasks including camera pose estimation, global structure inference, and down-stream image manipulation tasks. 2021-12-07T19:54:14Z 2021-12-07T19:54:14Z 2020 2021-12-07T19:50:45Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/138366 Li, Yikai, Mao, Jiayuan, Zhang, Xiuming, Freeman, William T, Tenenbaum, Joshua B et al. 2020. "Perspective Plane Program Induction From a Single Image." Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. en 10.1109/CVPR42600.2020.00449 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Electrical and Electronics Engineers (IEEE) arXiv
spellingShingle Li, Yikai
Mao, Jiayuan
Zhang, Xiuming
Freeman, William T
Tenenbaum, Joshua B
Wu, Jiajun
Perspective Plane Program Induction From a Single Image
title Perspective Plane Program Induction From a Single Image
title_full Perspective Plane Program Induction From a Single Image
title_fullStr Perspective Plane Program Induction From a Single Image
title_full_unstemmed Perspective Plane Program Induction From a Single Image
title_short Perspective Plane Program Induction From a Single Image
title_sort perspective plane program induction from a single image
url https://hdl.handle.net/1721.1/138366
work_keys_str_mv AT liyikai perspectiveplaneprograminductionfromasingleimage
AT maojiayuan perspectiveplaneprograminductionfromasingleimage
AT zhangxiuming perspectiveplaneprograminductionfromasingleimage
AT freemanwilliamt perspectiveplaneprograminductionfromasingleimage
AT tenenbaumjoshuab perspectiveplaneprograminductionfromasingleimage
AT wujiajun perspectiveplaneprograminductionfromasingleimage