On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path (SSP) problems, which are undiscounted, total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing con...

Full description

Bibliographic Details
Main Authors:	Yu, Huizhen, Bertsekas, Dimitri P.
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	en_US
Published:	Institute for Operations Research and the Management Sciences (INFORMS) 2015
Online Access:	http://hdl.handle.net/1721.1/93744 https://orcid.org/0000-0001-6909-7208

Internet

http://hdl.handle.net/1721.1/93744
https://orcid.org/0000-0001-6909-7208

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

Internet

Similar Items