Convergence Results for Some Temporal Difference Methods Based on Least Squares

We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(lambda ). These are temporal difference methods for constructing a linear function approximation of the cost func...

Full description

Bibliographic Details
Main Authors: Yu, Huizhen, Bertsekas, Dimitri P.
Other Authors: Massachusetts Institute of Technology. Laboratory for Information and Decision Systems
Format: Article
Language:en_US
Published: Institute of Electrical and Electronics Engineers 2012
Online Access:http://hdl.handle.net/1721.1/74102
https://orcid.org/0000-0001-6909-7208