An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward

We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov decision processes, where the reward function is parameterised in terms of a flexible mixture of Gaussians. This approach exploits both analytical tractability and numerical optimization. Consequentl...

Πλήρης περιγραφή

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριοι συγγραφείς:	Hoffman, M, de Freitas, N, Doucet, A, Peters, J
Μορφή:	Journal article
Έκδοση:	2009

An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward

Παρόμοια τεκμήρια