Trans-dimensional MCMC for Bayesian policy learning

A recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing Expec...

Full description

Bibliographic Details
Main Authors: Hoffman, M, Doucet, A, De Freitas, N, Jasra, A
Format: Journal article
Language:English
Published: 2009
_version_ 1797081939892502528
author Hoffman, M
Doucet, A
De Freitas, N
Jasra, A
author_facet Hoffman, M
Doucet, A
De Freitas, N
Jasra, A
author_sort Hoffman, M
collection OXFORD
description A recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing Expectation-Maximization (EM) algorithms. In this paper, we begin by making the crucial observation that the stochastic control problem can be reinterpreted as one of trans-dimensional inference. With this new interpretation, we are able to propose a novel reversible jump Markov chain Monte Carlo (MCMC) algorithm that is more efficient than its EM counterparts. Moreover, it enables us to implement full Bayesian policy search, without the need for gradients and with one single Markov chain. The new approach involves sampling directly from a distribution that is proportional to the reward and, consequently, performs better than classic simulations methods in situations where the reward is a rare event.
first_indexed 2024-03-07T01:21:04Z
format Journal article
id oxford-uuid:9059b5b6-87d3-448b-a8da-a59f2f3d104d
institution University of Oxford
language English
last_indexed 2024-03-07T01:21:04Z
publishDate 2009
record_format dspace
spelling oxford-uuid:9059b5b6-87d3-448b-a8da-a59f2f3d104d2022-03-26T23:11:00ZTrans-dimensional MCMC for Bayesian policy learningJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:9059b5b6-87d3-448b-a8da-a59f2f3d104dEnglishSymplectic Elements at Oxford2009Hoffman, MDoucet, ADe Freitas, NJasra, AA recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing Expectation-Maximization (EM) algorithms. In this paper, we begin by making the crucial observation that the stochastic control problem can be reinterpreted as one of trans-dimensional inference. With this new interpretation, we are able to propose a novel reversible jump Markov chain Monte Carlo (MCMC) algorithm that is more efficient than its EM counterparts. Moreover, it enables us to implement full Bayesian policy search, without the need for gradients and with one single Markov chain. The new approach involves sampling directly from a distribution that is proportional to the reward and, consequently, performs better than classic simulations methods in situations where the reward is a rare event.
spellingShingle Hoffman, M
Doucet, A
De Freitas, N
Jasra, A
Trans-dimensional MCMC for Bayesian policy learning
title Trans-dimensional MCMC for Bayesian policy learning
title_full Trans-dimensional MCMC for Bayesian policy learning
title_fullStr Trans-dimensional MCMC for Bayesian policy learning
title_full_unstemmed Trans-dimensional MCMC for Bayesian policy learning
title_short Trans-dimensional MCMC for Bayesian policy learning
title_sort trans dimensional mcmc for bayesian policy learning
work_keys_str_mv AT hoffmanm transdimensionalmcmcforbayesianpolicylearning
AT douceta transdimensionalmcmcforbayesianpolicylearning
AT defreitasn transdimensionalmcmcforbayesianpolicylearning
AT jasraa transdimensionalmcmcforbayesianpolicylearning