New inference strategies for solving Markov Decision Processes using reversible jump MCMC

In this paper we build on previous work which uses inferences techniques, in particular Markov Chain Monte Carlo (MCMC) methods, to solve parameterized control problems. We propose a number of modifications in order to make this approach more practical in general, higher-dimensional spaces. We first...

Full description

Bibliographic Details
Main Authors: Hoffman, M, Kueck, H, De Freitas, N, Doucet, A
Format: Journal article
Language:English
Published: 2009
_version_ 1797099867035664384
author Hoffman, M
Kueck, H
De Freitas, N
Doucet, A
author_facet Hoffman, M
Kueck, H
De Freitas, N
Doucet, A
author_sort Hoffman, M
collection OXFORD
description In this paper we build on previous work which uses inferences techniques, in particular Markov Chain Monte Carlo (MCMC) methods, to solve parameterized control problems. We propose a number of modifications in order to make this approach more practical in general, higher-dimensional spaces. We first introduce a new target distribution which is able to incorporate more reward information from sampled trajectories. We also show how to break strong correlations between the policy parameters and sampled trajectories in order to sample more freely. Finally, we show how to incorporate these techniques in a principled manner to obtain estimates of the optimal policy.
first_indexed 2024-03-07T05:29:44Z
format Journal article
id oxford-uuid:e1d3c6f8-c3b2-4e05-91d0-812368888bdd
institution University of Oxford
language English
last_indexed 2024-03-07T05:29:44Z
publishDate 2009
record_format dspace
spelling oxford-uuid:e1d3c6f8-c3b2-4e05-91d0-812368888bdd2022-03-27T09:56:54ZNew inference strategies for solving Markov Decision Processes using reversible jump MCMCJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:e1d3c6f8-c3b2-4e05-91d0-812368888bddEnglishSymplectic Elements at Oxford2009Hoffman, MKueck, HDe Freitas, NDoucet, AIn this paper we build on previous work which uses inferences techniques, in particular Markov Chain Monte Carlo (MCMC) methods, to solve parameterized control problems. We propose a number of modifications in order to make this approach more practical in general, higher-dimensional spaces. We first introduce a new target distribution which is able to incorporate more reward information from sampled trajectories. We also show how to break strong correlations between the policy parameters and sampled trajectories in order to sample more freely. Finally, we show how to incorporate these techniques in a principled manner to obtain estimates of the optimal policy.
spellingShingle Hoffman, M
Kueck, H
De Freitas, N
Doucet, A
New inference strategies for solving Markov Decision Processes using reversible jump MCMC
title New inference strategies for solving Markov Decision Processes using reversible jump MCMC
title_full New inference strategies for solving Markov Decision Processes using reversible jump MCMC
title_fullStr New inference strategies for solving Markov Decision Processes using reversible jump MCMC
title_full_unstemmed New inference strategies for solving Markov Decision Processes using reversible jump MCMC
title_short New inference strategies for solving Markov Decision Processes using reversible jump MCMC
title_sort new inference strategies for solving markov decision processes using reversible jump mcmc
work_keys_str_mv AT hoffmanm newinferencestrategiesforsolvingmarkovdecisionprocessesusingreversiblejumpmcmc
AT kueckh newinferencestrategiesforsolvingmarkovdecisionprocessesusingreversiblejumpmcmc
AT defreitasn newinferencestrategiesforsolvingmarkovdecisionprocessesusingreversiblejumpmcmc
AT douceta newinferencestrategiesforsolvingmarkovdecisionprocessesusingreversiblejumpmcmc