3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation

We provide more detailed explanation of the ideas behind a recent paper on “Object-Oriented Deep Learning” [1] and extend it to handle 3D inputs/outputs. Similar to [1], every layer of the system takes in a list of “objects/symbols”, processes it and outputs another list of objects/symbols. In this...

Full description

Bibliographic Details
Main Authors: Liao, Qianli, Poggio, Tomaso
Format: Technical Report
Language:en_US
Published: 2018
Online Access:http://hdl.handle.net/1721.1/113002
_version_ 1826210072949161984
author Liao, Qianli
Poggio, Tomaso
author_facet Liao, Qianli
Poggio, Tomaso
author_sort Liao, Qianli
collection MIT
description We provide more detailed explanation of the ideas behind a recent paper on “Object-Oriented Deep Learning” [1] and extend it to handle 3D inputs/outputs. Similar to [1], every layer of the system takes in a list of “objects/symbols”, processes it and outputs another list of objects/symbols. In this report, the properties of the objects/symbols are extended to contain 3D information — including 3D orientations (i.e., rotation quaternion or yaw, pitch and roll) and one extra coordinate dimension (z-axis or depth). The resultant model is a novel end-to-end interpretable 3D representation that systematically factors out common 3D transformations such as translation and 3D rotation. As first proposed by [1] and discussed in more detail in [2], it offers a “symbolic disentanglement” solution to the problem of transformation invariance/equivariance. To demonstrate the effectiveness of the model, we show that it can achieve perfect performance on the task of 3D invariant recognition by training on one rotation of a 3D object and test it on 3D rotations (i.e., at arbitrary angles of yaw, pitch and roll). Furthermore, in a more realistic case where depth information is not given (similar to viewpoint invariant object recognition from 2D vision) our model generalizes reasonably well to novel viewpoints while ConvNets fail to generalize.
first_indexed 2024-09-23T14:41:54Z
format Technical Report
id mit-1721.1/113002
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T14:41:54Z
publishDate 2018
record_format dspace
spelling mit-1721.1/1130022019-04-12T23:08:21Z 3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation Liao, Qianli Poggio, Tomaso We provide more detailed explanation of the ideas behind a recent paper on “Object-Oriented Deep Learning” [1] and extend it to handle 3D inputs/outputs. Similar to [1], every layer of the system takes in a list of “objects/symbols”, processes it and outputs another list of objects/symbols. In this report, the properties of the objects/symbols are extended to contain 3D information — including 3D orientations (i.e., rotation quaternion or yaw, pitch and roll) and one extra coordinate dimension (z-axis or depth). The resultant model is a novel end-to-end interpretable 3D representation that systematically factors out common 3D transformations such as translation and 3D rotation. As first proposed by [1] and discussed in more detail in [2], it offers a “symbolic disentanglement” solution to the problem of transformation invariance/equivariance. To demonstrate the effectiveness of the model, we show that it can achieve perfect performance on the task of 3D invariant recognition by training on one rotation of a 3D object and test it on 3D rotations (i.e., at arbitrary angles of yaw, pitch and roll). Furthermore, in a more realistic case where depth information is not given (similar to viewpoint invariant object recognition from 2D vision) our model generalizes reasonably well to novel viewpoints while ConvNets fail to generalize. This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216. 2018-01-01T00:03:52Z 2018-01-01T00:03:52Z 2017-12-31 Technical Report Working Paper Other http://hdl.handle.net/1721.1/113002 en_US CBMM Memo Series;075 Attribution-NonCommercial-ShareAlike 3.0 United States http://creativecommons.org/licenses/by-nc-sa/3.0/us/ application/pdf
spellingShingle Liao, Qianli
Poggio, Tomaso
3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation
title 3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation
title_full 3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation
title_fullStr 3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation
title_full_unstemmed 3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation
title_short 3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation
title_sort 3d object oriented learning an end to end transformation disentangled 3d representation
url http://hdl.handle.net/1721.1/113002
work_keys_str_mv AT liaoqianli 3dobjectorientedlearninganendtoendtransformationdisentangled3drepresentation
AT poggiotomaso 3dobjectorientedlearninganendtoendtransformationdisentangled3drepresentation