Approximate policy iteration for Markov decision processes via quantitative adaptive aggregations
We consider the problem of finding an optimal policy in a Markov decision process that maximises the expected discounted sum of rewards over an infinite time horizon. Since the explicit iterative dynamical programming scheme does not scale when increasing the dimension of the state space, a number o...
Main Authors: | , , |
---|---|
Format: | Conference item |
Published: |
Springer Verlag
2016
|