Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal state costs or Q-factors. The main difference is in the policy evaluation phase: instead of solving a linear system of equations, our algorithm...
Main Authors: | Bertsekas, Dimitri P, Yu, Huizhen |
---|---|
Other Authors: | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science |
Format: | Article |
Language: | en_US |
Published: |
Institute for Operations Research and the Management Sciences (INFORMS)
2019
|
Online Access: | https://hdl.handle.net/1721.1/121248 |
Similar Items
-
Distributed Asynchronous Policy Iteration in Dynamic Programming
by: Bertsekas, Dimitri P., et al.
Published: (2011) -
Q-learning and policy iteration algorithms for stochastic shortest path problems
by: Yu, Huizhen, et al.
Published: (2015) -
On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems
by: Yu, Huizhen, et al.
Published: (2015) -
Multiagent value iteration algorithms in dynamic programming and reinforcement learning
by: Dimitri Bertsekas
Published: (2020-12-01) -
Approximate policy iteration: A survey and some new methods
by: Bertsekas, Dimitri P.
Published: (2012)