Compositional Policy Priors
This paper describes a probabilistic framework for incorporating structured inductive biases into reinforcement learning. These inductive biases arise from policy priors, probability distributions over optimal policies. Borrowing recent ideas from computational linguistics and Bayesian nonparametric...
Main Authors: | , , , , |
---|---|
Other Authors: | |
Published: |
2013
|
Online Access: | http://hdl.handle.net/1721.1/78573 |
_version_ | 1826207918558543872 |
---|---|
author | Wingate, David Diuk, Carlos O'Donnell, Timothy Tenenbaum, Joshua Gershman, Samuel |
author2 | Joshua Tenenbaum |
author_facet | Joshua Tenenbaum Wingate, David Diuk, Carlos O'Donnell, Timothy Tenenbaum, Joshua Gershman, Samuel |
author_sort | Wingate, David |
collection | MIT |
description | This paper describes a probabilistic framework for incorporating structured inductive biases into reinforcement learning. These inductive biases arise from policy priors, probability distributions over optimal policies. Borrowing recent ideas from computational linguistics and Bayesian nonparametrics, we define several families of policy priors that express compositional, abstract structure in a domain. Compositionality is expressed using probabilistic context-free grammars, enabling a compact representation of hierarchically organized sub-tasks. Useful sequences of sub-tasks can be cached and reused by extending the grammars nonparametrically using Fragment Grammars. We present Monte Carlo methods for performing inference, and show how structured policy priors lead to substantially faster learning in complex domains compared to methods without inductive biases. |
first_indexed | 2024-09-23T13:57:01Z |
id | mit-1721.1/78573 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T13:57:01Z |
publishDate | 2013 |
record_format | dspace |
spelling | mit-1721.1/785732019-04-10T13:17:25Z Compositional Policy Priors Wingate, David Diuk, Carlos O'Donnell, Timothy Tenenbaum, Joshua Gershman, Samuel Joshua Tenenbaum Computational Cognitive Science This paper describes a probabilistic framework for incorporating structured inductive biases into reinforcement learning. These inductive biases arise from policy priors, probability distributions over optimal policies. Borrowing recent ideas from computational linguistics and Bayesian nonparametrics, we define several families of policy priors that express compositional, abstract structure in a domain. Compositionality is expressed using probabilistic context-free grammars, enabling a compact representation of hierarchically organized sub-tasks. Useful sequences of sub-tasks can be cached and reused by extending the grammars nonparametrically using Fragment Grammars. We present Monte Carlo methods for performing inference, and show how structured policy priors lead to substantially faster learning in complex domains compared to methods without inductive biases. This work was supported by AFOSR FA9550-07-1-0075 and ONR N00014-07-1-0937. SJG was supported by a Graduate Research Fellowship from the NSF. 2013-04-18T00:45:04Z 2013-04-18T00:45:04Z 2013-04-12 http://hdl.handle.net/1721.1/78573 MIT-CSAIL-TR-2013-007 17 p. application/pdf |
spellingShingle | Wingate, David Diuk, Carlos O'Donnell, Timothy Tenenbaum, Joshua Gershman, Samuel Compositional Policy Priors |
title | Compositional Policy Priors |
title_full | Compositional Policy Priors |
title_fullStr | Compositional Policy Priors |
title_full_unstemmed | Compositional Policy Priors |
title_short | Compositional Policy Priors |
title_sort | compositional policy priors |
url | http://hdl.handle.net/1721.1/78573 |
work_keys_str_mv | AT wingatedavid compositionalpolicypriors AT diukcarlos compositionalpolicypriors AT odonnelltimothy compositionalpolicypriors AT tenenbaumjoshua compositionalpolicypriors AT gershmansamuel compositionalpolicypriors |