Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal state costs or Q-factors. The main difference is in the policy evaluation phase: instead of solving a linear system of equations, our algorithm...

Full description

Bibliographic Details
Main Authors: Bertsekas, Dimitri P, Yu, Huizhen
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:en_US
Published: Institute for Operations Research and the Management Sciences (INFORMS) 2019
Online Access:https://hdl.handle.net/1721.1/121248