Convergence Results for Some Temporal Difference Methods Based on Least Squares

We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(lambda ). These are temporal difference methods for constructing a linear function approximation of the cost func...

全面介紹

書目詳細資料
Main Authors: Yu, Huizhen, Bertsekas, Dimitri P.
其他作者: Massachusetts Institute of Technology. Laboratory for Information and Decision Systems
格式: Article
語言:en_US
出版: Institute of Electrical and Electronics Engineers 2012
在線閱讀:http://hdl.handle.net/1721.1/74102
https://orcid.org/0000-0001-6909-7208