Least Squares Temporal Difference Methods: An Analysis under General Conditions

We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) with the least squares temporal difference (LSTD) algorithm, LSTD($\lambda$), in an exploration-enhanced learning context, where policy costs are computed from observations of a Markov chain differe...

Olles dieđut

Bibliográfalaš dieđut
Váldodahkki: Yu, Huizhen
Eará dahkkit: Massachusetts Institute of Technology. Laboratory for Information and Decision Systems
Materiálatiipa: Artihkal
Giella:en_US
Almmustuhtton: Society for Industrial and Applied Mathematics 2013
Liŋkkat:http://hdl.handle.net/1721.1/77629