Inferring Structured World Models from Videos

Advances in reinforcement learning have allowed agents to learn a variety of board games and video games at superhuman levels. Unlike humans - which can generalize to a wide range of tasks with very little experience - these algorithms typically need vast number of experience replays to perform at t...

Full description

Bibliographic Details
Main Author: Kapur, Shreyas
Other Authors: Tenenbaum, Joshua B.
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/144497
_version_ 1826196106127605760
author Kapur, Shreyas
author2 Tenenbaum, Joshua B.
author_facet Tenenbaum, Joshua B.
Kapur, Shreyas
author_sort Kapur, Shreyas
collection MIT
description Advances in reinforcement learning have allowed agents to learn a variety of board games and video games at superhuman levels. Unlike humans - which can generalize to a wide range of tasks with very little experience - these algorithms typically need vast number of experience replays to perform at the same level. In this thesis, we propose a model-based reinforcement learning approach that represents the environment using an explicit symbolic model in the form of a domain-specific language (DSL) that represents the world as a set of discrete objects with underlying latent properties that govern their dynamical interactions. We present a novel, neurally guided, on-line inference technique to recover the structured world representation from raw video observations, with the intent to be used for downstream model-based planning. We qualitatively evaluate our inference performance on classical Atari games, as well as on physics-based mobile games.
first_indexed 2024-09-23T10:20:41Z
format Thesis
id mit-1721.1/144497
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T10:20:41Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1444972022-08-30T03:49:24Z Inferring Structured World Models from Videos Kapur, Shreyas Tenenbaum, Joshua B. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Advances in reinforcement learning have allowed agents to learn a variety of board games and video games at superhuman levels. Unlike humans - which can generalize to a wide range of tasks with very little experience - these algorithms typically need vast number of experience replays to perform at the same level. In this thesis, we propose a model-based reinforcement learning approach that represents the environment using an explicit symbolic model in the form of a domain-specific language (DSL) that represents the world as a set of discrete objects with underlying latent properties that govern their dynamical interactions. We present a novel, neurally guided, on-line inference technique to recover the structured world representation from raw video observations, with the intent to be used for downstream model-based planning. We qualitatively evaluate our inference performance on classical Atari games, as well as on physics-based mobile games. M.Eng. 2022-08-29T15:51:35Z 2022-08-29T15:51:35Z 2022-05 2022-05-27T16:18:15.335Z Thesis https://hdl.handle.net/1721.1/144497 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Kapur, Shreyas
Inferring Structured World Models from Videos
title Inferring Structured World Models from Videos
title_full Inferring Structured World Models from Videos
title_fullStr Inferring Structured World Models from Videos
title_full_unstemmed Inferring Structured World Models from Videos
title_short Inferring Structured World Models from Videos
title_sort inferring structured world models from videos
url https://hdl.handle.net/1721.1/144497
work_keys_str_mv AT kapurshreyas inferringstructuredworldmodelsfromvideos