On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems
We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path (SSP) problems, which are undiscounted, total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing con...
Main Authors: | Yu, Huizhen, Bertsekas, Dimitri P. |
---|---|
Other Authors: | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science |
Format: | Article |
Language: | en_US |
Published: |
Institute for Operations Research and the Management Sciences (INFORMS)
2015
|
Online Access: | http://hdl.handle.net/1721.1/93744 https://orcid.org/0000-0001-6909-7208 |
Similar Items
-
Q-learning and policy iteration algorithms for stochastic shortest path problems
by: Yu, Huizhen, et al.
Published: (2015) -
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
by: Bertsekas, Dimitri P, et al.
Published: (2019) -
Stochastic shortest path problems with recourse
Published: (2003) -
An analysis of stochastic shortest path problems
Published: (2003) -
Distributed Asynchronous Policy Iteration in Dynamic Programming
by: Bertsekas, Dimitri P., et al.
Published: (2011)