Least Squares Temporal Difference Methods: An Analysis under General Conditions
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) with the least squares temporal difference (LSTD) algorithm, LSTD($\lambda$), in an exploration-enhanced learning context, where policy costs are computed from observations of a Markov chain differe...
Váldodahkki: | |
---|---|
Eará dahkkit: | |
Materiálatiipa: | Artihkal |
Giella: | en_US |
Almmustuhtton: |
Society for Industrial and Applied Mathematics
2013
|
Liŋkkat: | http://hdl.handle.net/1721.1/77629 |