Deep convolutional inverse graphics network

This paper presents the Deep Convolution Inverse Graphics Network (DC-IGN), a model that aims to learn an interpretable representation of images, disentangled with respect to three-dimensional scene structure and viewing transformations such as depth rotations and lighting variations. The DC-IGN mod...

Full description

Bibliographic Details
Main Authors: Kohli, Pushmeet, Kulkarni, Tejas Dattatraya, Whitney, William F., Tenenbaum, Joshua B
Other Authors: Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences
Format: Article
Published: Neural Information Processing Systems Foundation, Inc 2017
Online Access:http://hdl.handle.net/1721.1/112752
https://orcid.org/0000-0002-7077-2765
https://orcid.org/0000-0002-0628-6789
https://orcid.org/0000-0002-1925-2035
_version_ 1826196570892140544
author Kohli, Pushmeet
Kulkarni, Tejas Dattatraya
Whitney, William F.
Tenenbaum, Joshua B
author2 Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences
author_facet Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences
Kohli, Pushmeet
Kulkarni, Tejas Dattatraya
Whitney, William F.
Tenenbaum, Joshua B
author_sort Kohli, Pushmeet
collection MIT
description This paper presents the Deep Convolution Inverse Graphics Network (DC-IGN), a model that aims to learn an interpretable representation of images, disentangled with respect to three-dimensional scene structure and viewing transformations such as depth rotations and lighting variations. The DC-IGN model is composed of multiple layers of convolution and de-convolution operators and is trained using the Stochastic Gradient Variational Bayes (SGVB) algorithm [10]. We propose a training procedure to encourage neurons in the graphics code layer to represent a specific transformation (e.g. pose or light). Given a single input image, our model can generate new images of the same object with variations in pose and lighting. We present qualitative and quantitative tests of the model's efficacy at learning a 3D rendering engine for varied object classes including faces and chairs.
first_indexed 2024-09-23T10:29:12Z
format Article
id mit-1721.1/112752
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T10:29:12Z
publishDate 2017
publisher Neural Information Processing Systems Foundation, Inc
record_format dspace
spelling mit-1721.1/1127522022-09-27T09:47:17Z Deep convolutional inverse graphics network Kohli, Pushmeet Kulkarni, Tejas Dattatraya Whitney, William F. Tenenbaum, Joshua B Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences Kulkarni, Tejas Dattatraya Whitney, William F. Tenenbaum, Joshua B This paper presents the Deep Convolution Inverse Graphics Network (DC-IGN), a model that aims to learn an interpretable representation of images, disentangled with respect to three-dimensional scene structure and viewing transformations such as depth rotations and lighting variations. The DC-IGN model is composed of multiple layers of convolution and de-convolution operators and is trained using the Stochastic Gradient Variational Bayes (SGVB) algorithm [10]. We propose a training procedure to encourage neurons in the graphics code layer to represent a specific transformation (e.g. pose or light). Given a single input image, our model can generate new images of the same object with variations in pose and lighting. We present qualitative and quantitative tests of the model's efficacy at learning a 3D rendering engine for varied object classes including faces and chairs. 2017-12-14T15:30:20Z 2017-12-14T15:30:20Z 2015 2017-12-08T17:40:25Z Article http://purl.org/eprint/type/ConferencePaper http://hdl.handle.net/1721.1/112752 Kulkarni, Tejas D. et al. "Deep convolutional inverse graphics network." Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS 2015), December 7-12 2015, Montreal, Canada, Neural Information Processing Systems Foundation, 2015 © 2015 Neural Information Processing Systems Foundation, Inc https://orcid.org/0000-0002-7077-2765 https://orcid.org/0000-0002-0628-6789 https://orcid.org/0000-0002-1925-2035 https://papers.nips.cc/paper/5851-deep-convolutional-inverse-graphics-network Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS 2015) Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Neural Information Processing Systems Foundation, Inc Neural Information Processing Systems (NIPS)
spellingShingle Kohli, Pushmeet
Kulkarni, Tejas Dattatraya
Whitney, William F.
Tenenbaum, Joshua B
Deep convolutional inverse graphics network
title Deep convolutional inverse graphics network
title_full Deep convolutional inverse graphics network
title_fullStr Deep convolutional inverse graphics network
title_full_unstemmed Deep convolutional inverse graphics network
title_short Deep convolutional inverse graphics network
title_sort deep convolutional inverse graphics network
url http://hdl.handle.net/1721.1/112752
https://orcid.org/0000-0002-7077-2765
https://orcid.org/0000-0002-0628-6789
https://orcid.org/0000-0002-1925-2035
work_keys_str_mv AT kohlipushmeet deepconvolutionalinversegraphicsnetwork
AT kulkarnitejasdattatraya deepconvolutionalinversegraphicsnetwork
AT whitneywilliamf deepconvolutionalinversegraphicsnetwork
AT tenenbaumjoshuab deepconvolutionalinversegraphicsnetwork