Convergence Results for Some Temporal Difference Methods Based on Least Squares
We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(lambda ). These are temporal difference methods for constructing a linear function approximation of the cost func...
Main Authors: | Yu, Huizhen, Bertsekas, Dimitri P. |
---|---|
Other Authors: | Massachusetts Institute of Technology. Laboratory for Information and Decision Systems |
Format: | Article |
Language: | en_US |
Published: |
Institute of Electrical and Electronics Engineers
2012
|
Online Access: | http://hdl.handle.net/1721.1/74102 https://orcid.org/0000-0001-6909-7208 |
Similar Items
-
Least Squares Temporal Difference Methods: An Analysis under General Conditions
by: Yu, Huizhen
Published: (2013) -
A unified framework for temporal difference methods
by: Bertsekas, Dimitri P.
Published: (2010) -
Pathologies of Temporal Difference Methods in Approximate Dynamic Programming
by: Bertsekas, Dimitri P.
Published: (2011) -
Proximal algorithms and temporal difference methods for solving fixed point problems
by: Bertsekas, Dimitri P
Published: (2021) -
Basis Function Adaptation Methods for Cost Approximation in MDP
by: Yu, Huizhen, et al.
Published: (2010)