An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward
We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov decision processes, where the reward function is parameterised in terms of a flexible mixture of Gaussians. This approach exploits both analytical tractability and numerical optimization. Consequentl...
Príomhchruthaitheoirí: | , , , |
---|---|
Formáid: | Journal article |
Foilsithe / Cruthaithe: |
2009
|