Least Squares Temporal Difference Methods: An Analysis under General Conditions
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) with the least squares temporal difference (LSTD) algorithm, LSTD($\lambda$), in an exploration-enhanced learning context, where policy costs are computed from observations of a Markov chain differe...
Main Author: | Yu, Huizhen |
---|---|
Other Authors: | Massachusetts Institute of Technology. Laboratory for Information and Decision Systems |
Format: | Article |
Language: | en_US |
Published: |
Society for Industrial and Applied Mathematics
2013
|
Online Access: | http://hdl.handle.net/1721.1/77629 |
Similar Items
-
Convergence Results for Some Temporal Difference Methods Based on Least Squares
by: Yu, Huizhen, et al.
Published: (2012) -
Generalized least squares /
by: 252411 Takeaki, Kariya, et al.
Published: (2004) -
Gauss–Newton–Secant Method for Solving Nonlinear Least Squares Problems under Generalized Lipschitz Conditions
by: Ioannis K. Argyros, et al.
Published: (2021-07-01) -
The method of least squares /
by: 293651 Wells, D. E., et al.
Published: (1971) -
Deep reinforcement learning using least‐squares truncated temporal‐difference
by: Junkai Ren, et al.
Published: (2024-04-01)