3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation

We provide more detailed explanation of the ideas behind a recent paper on “Object-Oriented Deep Learning” [1] and extend it to handle 3D inputs/outputs. Similar to [1], every layer of the system takes in a list of “objects/symbols”, processes it and outputs another list of objects/symbols. In this...

Full description

Bibliographic Details
Main Authors:	Liao, Qianli, Poggio, Tomaso
Format:	Technical Report
Language:	en_US
Published:	2018
Online Access:	http://hdl.handle.net/1721.1/113002

_version_	1826210072949161984
author	Liao, Qianli Poggio, Tomaso
author_facet	Liao, Qianli Poggio, Tomaso
author_sort	Liao, Qianli
collection	MIT
description	We provide more detailed explanation of the ideas behind a recent paper on “Object-Oriented Deep Learning” [1] and extend it to handle 3D inputs/outputs. Similar to [1], every layer of the system takes in a list of “objects/symbols”, processes it and outputs another list of objects/symbols. In this report, the properties of the objects/symbols are extended to contain 3D information — including 3D orientations (i.e., rotation quaternion or yaw, pitch and roll) and one extra coordinate dimension (z-axis or depth). The resultant model is a novel end-to-end interpretable 3D representation that systematically factors out common 3D transformations such as translation and 3D rotation. As first proposed by [1] and discussed in more detail in [2], it offers a “symbolic disentanglement” solution to the problem of transformation invariance/equivariance. To demonstrate the effectiveness of the model, we show that it can achieve perfect performance on the task of 3D invariant recognition by training on one rotation of a 3D object and test it on 3D rotations (i.e., at arbitrary angles of yaw, pitch and roll). Furthermore, in a more realistic case where depth information is not given (similar to viewpoint invariant object recognition from 2D vision) our model generalizes reasonably well to novel viewpoints while ConvNets fail to generalize.
first_indexed	2024-09-23T14:41:54Z
format	Technical Report
id	mit-1721.1/113002
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T14:41:54Z
publishDate	2018
record_format	dspace
spelling	mit-1721.1/1130022019-04-12T23:08:21Z 3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation Liao, Qianli Poggio, Tomaso We provide more detailed explanation of the ideas behind a recent paper on “Object-Oriented Deep Learning” [1] and extend it to handle 3D inputs/outputs. Similar to [1], every layer of the system takes in a list of “objects/symbols”, processes it and outputs another list of objects/symbols. In this report, the properties of the objects/symbols are extended to contain 3D information — including 3D orientations (i.e., rotation quaternion or yaw, pitch and roll) and one extra coordinate dimension (z-axis or depth). The resultant model is a novel end-to-end interpretable 3D representation that systematically factors out common 3D transformations such as translation and 3D rotation. As first proposed by [1] and discussed in more detail in [2], it offers a “symbolic disentanglement” solution to the problem of transformation invariance/equivariance. To demonstrate the effectiveness of the model, we show that it can achieve perfect performance on the task of 3D invariant recognition by training on one rotation of a 3D object and test it on 3D rotations (i.e., at arbitrary angles of yaw, pitch and roll). Furthermore, in a more realistic case where depth information is not given (similar to viewpoint invariant object recognition from 2D vision) our model generalizes reasonably well to novel viewpoints while ConvNets fail to generalize. This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216. 2018-01-01T00:03:52Z 2018-01-01T00:03:52Z 2017-12-31 Technical Report Working Paper Other http://hdl.handle.net/1721.1/113002 en_US CBMM Memo Series;075 Attribution-NonCommercial-ShareAlike 3.0 United States http://creativecommons.org/licenses/by-nc-sa/3.0/us/ application/pdf
spellingShingle	Liao, Qianli Poggio, Tomaso 3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation
title	3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation
title_full	3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation
title_fullStr	3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation
title_full_unstemmed	3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation
title_short	3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation
title_sort	3d object oriented learning an end to end transformation disentangled 3d representation
url	http://hdl.handle.net/1721.1/113002
work_keys_str_mv	AT liaoqianli 3dobjectorientedlearninganendtoendtransformationdisentangled3drepresentation AT poggiotomaso 3dobjectorientedlearninganendtoendtransformationdisentangled3drepresentation

3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation

Similar Items