Trans-dimensional MCMC for Bayesian policy learning
A recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing Expec...
Main Authors: | , , , |
---|---|
Format: | Journal article |
Language: | English |
Published: |
2009
|
_version_ | 1797081939892502528 |
---|---|
author | Hoffman, M Doucet, A De Freitas, N Jasra, A |
author_facet | Hoffman, M Doucet, A De Freitas, N Jasra, A |
author_sort | Hoffman, M |
collection | OXFORD |
description | A recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing Expectation-Maximization (EM) algorithms. In this paper, we begin by making the crucial observation that the stochastic control problem can be reinterpreted as one of trans-dimensional inference. With this new interpretation, we are able to propose a novel reversible jump Markov chain Monte Carlo (MCMC) algorithm that is more efficient than its EM counterparts. Moreover, it enables us to implement full Bayesian policy search, without the need for gradients and with one single Markov chain. The new approach involves sampling directly from a distribution that is proportional to the reward and, consequently, performs better than classic simulations methods in situations where the reward is a rare event. |
first_indexed | 2024-03-07T01:21:04Z |
format | Journal article |
id | oxford-uuid:9059b5b6-87d3-448b-a8da-a59f2f3d104d |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T01:21:04Z |
publishDate | 2009 |
record_format | dspace |
spelling | oxford-uuid:9059b5b6-87d3-448b-a8da-a59f2f3d104d2022-03-26T23:11:00ZTrans-dimensional MCMC for Bayesian policy learningJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:9059b5b6-87d3-448b-a8da-a59f2f3d104dEnglishSymplectic Elements at Oxford2009Hoffman, MDoucet, ADe Freitas, NJasra, AA recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing Expectation-Maximization (EM) algorithms. In this paper, we begin by making the crucial observation that the stochastic control problem can be reinterpreted as one of trans-dimensional inference. With this new interpretation, we are able to propose a novel reversible jump Markov chain Monte Carlo (MCMC) algorithm that is more efficient than its EM counterparts. Moreover, it enables us to implement full Bayesian policy search, without the need for gradients and with one single Markov chain. The new approach involves sampling directly from a distribution that is proportional to the reward and, consequently, performs better than classic simulations methods in situations where the reward is a rare event. |
spellingShingle | Hoffman, M Doucet, A De Freitas, N Jasra, A Trans-dimensional MCMC for Bayesian policy learning |
title | Trans-dimensional MCMC for Bayesian policy learning |
title_full | Trans-dimensional MCMC for Bayesian policy learning |
title_fullStr | Trans-dimensional MCMC for Bayesian policy learning |
title_full_unstemmed | Trans-dimensional MCMC for Bayesian policy learning |
title_short | Trans-dimensional MCMC for Bayesian policy learning |
title_sort | trans dimensional mcmc for bayesian policy learning |
work_keys_str_mv | AT hoffmanm transdimensionalmcmcforbayesianpolicylearning AT douceta transdimensionalmcmcforbayesianpolicylearning AT defreitasn transdimensionalmcmcforbayesianpolicylearning AT jasraa transdimensionalmcmcforbayesianpolicylearning |