Q-learning and policy iteration algorithms for stochastic shortest path problems
We consider the stochastic shortest path problem, a classical finite-state Markovian decision problem with a termination state, and we propose new convergent Q-learning algorithms that combine elements of policy iteration and classical Q-learning/value iteration. These algorithms are related to the...
Main Authors: | Yu, Huizhen, Bertsekas, Dimitri P. |
---|---|
Other Authors: | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science |
Format: | Article |
Language: | en_US |
Published: |
Springer-Verlag
2015
|
Online Access: | http://hdl.handle.net/1721.1/93745 https://orcid.org/0000-0001-6909-7208 |
Similar Items
-
On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems
by: Yu, Huizhen, et al.
Published: (2015) -
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
by: Bertsekas, Dimitri P, et al.
Published: (2019) -
Distributed Asynchronous Policy Iteration in Dynamic Programming
by: Bertsekas, Dimitri P., et al.
Published: (2011) -
An analysis of stochastic shortest path problems
Published: (2003) -
Stochastic shortest path problems with recourse
Published: (2003)