On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems
We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path (SSP) problems, which are undiscounted, total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing con...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
Institute for Operations Research and the Management Sciences (INFORMS)
2015
|
Online Access: | http://hdl.handle.net/1721.1/93744 https://orcid.org/0000-0001-6909-7208 |