Compositional Policy Priors

This paper describes a probabilistic framework for incorporating structured inductive biases into reinforcement learning. These inductive biases arise from policy priors, probability distributions over optimal policies. Borrowing recent ideas from computational linguistics and Bayesian nonparametric...

Full description

Bibliographic Details
Main Authors: Wingate, David, Diuk, Carlos, O'Donnell, Timothy, Tenenbaum, Joshua, Gershman, Samuel
Other Authors: Joshua Tenenbaum
Published: 2013
Online Access:http://hdl.handle.net/1721.1/78573
_version_ 1826207918558543872
author Wingate, David
Diuk, Carlos
O'Donnell, Timothy
Tenenbaum, Joshua
Gershman, Samuel
author2 Joshua Tenenbaum
author_facet Joshua Tenenbaum
Wingate, David
Diuk, Carlos
O'Donnell, Timothy
Tenenbaum, Joshua
Gershman, Samuel
author_sort Wingate, David
collection MIT
description This paper describes a probabilistic framework for incorporating structured inductive biases into reinforcement learning. These inductive biases arise from policy priors, probability distributions over optimal policies. Borrowing recent ideas from computational linguistics and Bayesian nonparametrics, we define several families of policy priors that express compositional, abstract structure in a domain. Compositionality is expressed using probabilistic context-free grammars, enabling a compact representation of hierarchically organized sub-tasks. Useful sequences of sub-tasks can be cached and reused by extending the grammars nonparametrically using Fragment Grammars. We present Monte Carlo methods for performing inference, and show how structured policy priors lead to substantially faster learning in complex domains compared to methods without inductive biases.
first_indexed 2024-09-23T13:57:01Z
id mit-1721.1/78573
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T13:57:01Z
publishDate 2013
record_format dspace
spelling mit-1721.1/785732019-04-10T13:17:25Z Compositional Policy Priors Wingate, David Diuk, Carlos O'Donnell, Timothy Tenenbaum, Joshua Gershman, Samuel Joshua Tenenbaum Computational Cognitive Science This paper describes a probabilistic framework for incorporating structured inductive biases into reinforcement learning. These inductive biases arise from policy priors, probability distributions over optimal policies. Borrowing recent ideas from computational linguistics and Bayesian nonparametrics, we define several families of policy priors that express compositional, abstract structure in a domain. Compositionality is expressed using probabilistic context-free grammars, enabling a compact representation of hierarchically organized sub-tasks. Useful sequences of sub-tasks can be cached and reused by extending the grammars nonparametrically using Fragment Grammars. We present Monte Carlo methods for performing inference, and show how structured policy priors lead to substantially faster learning in complex domains compared to methods without inductive biases. This work was supported by AFOSR FA9550-07-1-0075 and ONR N00014-07-1-0937. SJG was supported by a Graduate Research Fellowship from the NSF. 2013-04-18T00:45:04Z 2013-04-18T00:45:04Z 2013-04-12 http://hdl.handle.net/1721.1/78573 MIT-CSAIL-TR-2013-007 17 p. application/pdf
spellingShingle Wingate, David
Diuk, Carlos
O'Donnell, Timothy
Tenenbaum, Joshua
Gershman, Samuel
Compositional Policy Priors
title Compositional Policy Priors
title_full Compositional Policy Priors
title_fullStr Compositional Policy Priors
title_full_unstemmed Compositional Policy Priors
title_short Compositional Policy Priors
title_sort compositional policy priors
url http://hdl.handle.net/1721.1/78573
work_keys_str_mv AT wingatedavid compositionalpolicypriors
AT diukcarlos compositionalpolicypriors
AT odonnelltimothy compositionalpolicypriors
AT tenenbaumjoshua compositionalpolicypriors
AT gershmansamuel compositionalpolicypriors