An expectation maximization algorithm for continuous Markov decision processes with arbitrary rewards

We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov decision processes, where the reward function is parameterized in terms of a flexible mixture of Gaussians. This approach exploits both analytical tractability and numerical optimization. Consequentl...

Бүрэн тодорхойлолт

Номзүйн дэлгэрэнгүй
Үндсэн зохиолчид: Hoffman, M, De Freitas, N, Doucet, A, Peters, J
Формат: Journal article
Хэл сонгох:English
Хэвлэсэн: 2009