Convergence Results for Some Temporal Difference Methods Based on Least Squares
We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(lambda ). These are temporal difference methods for constructing a linear function approximation of the cost func...
Main Authors: | , |
---|---|
其他作者: | |
格式: | Article |
語言: | en_US |
出版: |
Institute of Electrical and Electronics Engineers
2012
|
在線閱讀: | http://hdl.handle.net/1721.1/74102 https://orcid.org/0000-0001-6909-7208 |