Inferring Structured World Models from Videos
Advances in reinforcement learning have allowed agents to learn a variety of board games and video games at superhuman levels. Unlike humans - which can generalize to a wide range of tasks with very little experience - these algorithms typically need vast number of experience replays to perform at t...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/144497 |
_version_ | 1826196106127605760 |
---|---|
author | Kapur, Shreyas |
author2 | Tenenbaum, Joshua B. |
author_facet | Tenenbaum, Joshua B. Kapur, Shreyas |
author_sort | Kapur, Shreyas |
collection | MIT |
description | Advances in reinforcement learning have allowed agents to learn a variety of board games and video games at superhuman levels. Unlike humans - which can generalize to a wide range of tasks with very little experience - these algorithms typically need vast number of experience replays to perform at the same level. In this thesis, we propose a model-based reinforcement learning approach that represents the environment using an explicit symbolic model in the form of a domain-specific language (DSL) that represents the world as a set of discrete objects with underlying latent properties that govern their dynamical interactions. We present a novel, neurally guided, on-line inference technique to recover the structured world representation from raw video observations, with the intent to be used for downstream model-based planning. We qualitatively evaluate our inference performance on classical Atari games, as well as on physics-based mobile games. |
first_indexed | 2024-09-23T10:20:41Z |
format | Thesis |
id | mit-1721.1/144497 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T10:20:41Z |
publishDate | 2022 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1444972022-08-30T03:49:24Z Inferring Structured World Models from Videos Kapur, Shreyas Tenenbaum, Joshua B. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Advances in reinforcement learning have allowed agents to learn a variety of board games and video games at superhuman levels. Unlike humans - which can generalize to a wide range of tasks with very little experience - these algorithms typically need vast number of experience replays to perform at the same level. In this thesis, we propose a model-based reinforcement learning approach that represents the environment using an explicit symbolic model in the form of a domain-specific language (DSL) that represents the world as a set of discrete objects with underlying latent properties that govern their dynamical interactions. We present a novel, neurally guided, on-line inference technique to recover the structured world representation from raw video observations, with the intent to be used for downstream model-based planning. We qualitatively evaluate our inference performance on classical Atari games, as well as on physics-based mobile games. M.Eng. 2022-08-29T15:51:35Z 2022-08-29T15:51:35Z 2022-05 2022-05-27T16:18:15.335Z Thesis https://hdl.handle.net/1721.1/144497 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Kapur, Shreyas Inferring Structured World Models from Videos |
title | Inferring Structured World Models from Videos |
title_full | Inferring Structured World Models from Videos |
title_fullStr | Inferring Structured World Models from Videos |
title_full_unstemmed | Inferring Structured World Models from Videos |
title_short | Inferring Structured World Models from Videos |
title_sort | inferring structured world models from videos |
url | https://hdl.handle.net/1721.1/144497 |
work_keys_str_mv | AT kapurshreyas inferringstructuredworldmodelsfromvideos |