Perspective Plane Program Induction From a Single Image
© 2020 IEEE. We study the inverse graphics problem of inferring a holistic representation for natural images. Given an input image, our goal is to induce a neuro-symbolic, program-like representation that jointly models camera poses, object locations, and global scene structures. Such high-level, ho...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Institute of Electrical and Electronics Engineers (IEEE)
2021
|
Online Access: | https://hdl.handle.net/1721.1/138366 |
_version_ | 1826194818983788544 |
---|---|
author | Li, Yikai Mao, Jiayuan Zhang, Xiuming Freeman, William T Tenenbaum, Joshua B Wu, Jiajun |
author_facet | Li, Yikai Mao, Jiayuan Zhang, Xiuming Freeman, William T Tenenbaum, Joshua B Wu, Jiajun |
author_sort | Li, Yikai |
collection | MIT |
description | © 2020 IEEE. We study the inverse graphics problem of inferring a holistic representation for natural images. Given an input image, our goal is to induce a neuro-symbolic, program-like representation that jointly models camera poses, object locations, and global scene structures. Such high-level, holistic scene representations further facilitate low-level image manipulation tasks such as inpainting. We formulate this problem as jointly finding the camera pose and scene structure that best describe the input image. The benefits of such joint inference are two-fold: scene regularity serves as a new cue for perspective correction, and in turn, correct perspective correction leads to a simplified scene structure, similar to how the correct shape leads to the most regular texture in shape from texture. Our proposed framework, Perspective Plane Program Induction (P3I), combines search-based and gradient-based algorithms to efficiently solve the problem. P3I outperforms a set of baselines on a collection of Internet images, across tasks including camera pose estimation, global structure inference, and down-stream image manipulation tasks. |
first_indexed | 2024-09-23T10:02:39Z |
format | Article |
id | mit-1721.1/138366 |
institution | Massachusetts Institute of Technology |
language | English |
last_indexed | 2024-09-23T10:02:39Z |
publishDate | 2021 |
publisher | Institute of Electrical and Electronics Engineers (IEEE) |
record_format | dspace |
spelling | mit-1721.1/1383662021-12-08T03:30:13Z Perspective Plane Program Induction From a Single Image Li, Yikai Mao, Jiayuan Zhang, Xiuming Freeman, William T Tenenbaum, Joshua B Wu, Jiajun © 2020 IEEE. We study the inverse graphics problem of inferring a holistic representation for natural images. Given an input image, our goal is to induce a neuro-symbolic, program-like representation that jointly models camera poses, object locations, and global scene structures. Such high-level, holistic scene representations further facilitate low-level image manipulation tasks such as inpainting. We formulate this problem as jointly finding the camera pose and scene structure that best describe the input image. The benefits of such joint inference are two-fold: scene regularity serves as a new cue for perspective correction, and in turn, correct perspective correction leads to a simplified scene structure, similar to how the correct shape leads to the most regular texture in shape from texture. Our proposed framework, Perspective Plane Program Induction (P3I), combines search-based and gradient-based algorithms to efficiently solve the problem. P3I outperforms a set of baselines on a collection of Internet images, across tasks including camera pose estimation, global structure inference, and down-stream image manipulation tasks. 2021-12-07T19:54:14Z 2021-12-07T19:54:14Z 2020 2021-12-07T19:50:45Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/138366 Li, Yikai, Mao, Jiayuan, Zhang, Xiuming, Freeman, William T, Tenenbaum, Joshua B et al. 2020. "Perspective Plane Program Induction From a Single Image." Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. en 10.1109/CVPR42600.2020.00449 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Electrical and Electronics Engineers (IEEE) arXiv |
spellingShingle | Li, Yikai Mao, Jiayuan Zhang, Xiuming Freeman, William T Tenenbaum, Joshua B Wu, Jiajun Perspective Plane Program Induction From a Single Image |
title | Perspective Plane Program Induction From a Single Image |
title_full | Perspective Plane Program Induction From a Single Image |
title_fullStr | Perspective Plane Program Induction From a Single Image |
title_full_unstemmed | Perspective Plane Program Induction From a Single Image |
title_short | Perspective Plane Program Induction From a Single Image |
title_sort | perspective plane program induction from a single image |
url | https://hdl.handle.net/1721.1/138366 |
work_keys_str_mv | AT liyikai perspectiveplaneprograminductionfromasingleimage AT maojiayuan perspectiveplaneprograminductionfromasingleimage AT zhangxiuming perspectiveplaneprograminductionfromasingleimage AT freemanwilliamt perspectiveplaneprograminductionfromasingleimage AT tenenbaumjoshuab perspectiveplaneprograminductionfromasingleimage AT wujiajun perspectiveplaneprograminductionfromasingleimage |