Unsupervised discovery of parts, structure, and dynamics

Humans easily recognize object parts and their hierarchical structure by watching how they move; they can then predict how each part moves in the future. In this paper, we propose a novel formulation that simultaneously learns a hierarchical, disentangled object representation and a dynamics model f...

Full description

Bibliographic Details
Main Authors: Xu, Zhenjia, Liu, Zhijian, Sun, Chen, Murphy, Kevin, Freeman, William T, Tenenbaum, Joshua B, Wu, Jiajun
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:English
Published: 2020
Online Access:https://hdl.handle.net/1721.1/126655
Description
Summary:Humans easily recognize object parts and their hierarchical structure by watching how they move; they can then predict how each part moves in the future. In this paper, we propose a novel formulation that simultaneously learns a hierarchical, disentangled object representation and a dynamics model for object parts from unlabeled videos. Our Parts, Structure, and Dynamics (PSD) model learns to, first, recognize the object parts via a layered image representation; second, predict hierarchy via a structural descriptor that composes low-level concepts into a hierarchical structure; and third, model the system dynamics by predicting the future. Experiments on multiple real and synthetic datasets demonstrate that our PSD model works well on all three tasks: segmenting object parts, building their hierarchical structure, and capturing their motion distributions.