An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward
We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov decision processes, where the reward function is parameterised in terms of a flexible mixture of Gaussians. This approach exploits both analytical tractability and numerical optimization. Consequentl...
Główni autorzy: | Hoffman, M, de Freitas, N, Doucet, A, Peters, J |
---|---|
Format: | Journal article |
Wydane: |
2009
|
Podobne zapisy
-
An expectation maximization algorithm for continuous Markov decision processes with arbitrary rewards
od: Hoffman, M, i wsp.
Wydane: (2009) -
Learning to maximize reward rate: a model based on semi-Markov decision processes
od: Arash eKhodadadi, i wsp.
Wydane: (2014-05-01) -
Expectation-maximization algorithms for inference in Dirichlet processes mixture
od: Kimura, T, i wsp.
Wydane: (2013) -
Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces
od: Quanxin Zhu, i wsp.
Wydane: (2009-01-01) -
New inference strategies for solving Markov Decision Processes using reversible jump MCMC
od: Hoffman, M, i wsp.
Wydane: (2009)