On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path (SSP) problems, which are undiscounted, total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing con...

Full description

Bibliographic Details
Main Authors: Yu, Huizhen, Bertsekas, Dimitri P.
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:en_US
Published: Institute for Operations Research and the Management Sciences (INFORMS) 2015
Online Access:http://hdl.handle.net/1721.1/93744
https://orcid.org/0000-0001-6909-7208