An expectation maximization algorithm for continuous Markov decision processes with arbitrary rewards
We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov decision processes, where the reward function is parameterized in terms of a flexible mixture of Gaussians. This approach exploits both analytical tractability and numerical optimization. Consequentl...
Hoofdauteurs: | , , , |
---|---|
Formaat: | Journal article |
Taal: | English |
Gepubliceerd in: |
2009
|