Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal state costs or Q-factors. The main difference is in the policy evaluation phase: instead of solving a linear system of equations, our algorithm...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
Institute for Operations Research and the Management Sciences (INFORMS)
2019
|
Online Access: | https://hdl.handle.net/1721.1/121248 |