Pathologies of Temporal Difference Methods in Approximate Dynamic Programming

Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated convergence behavior is complex, and not well understood at present. An important question is whether the policy iterati...

Full description

Bibliographic Details
Main Author: Bertsekas, Dimitri P.
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:en_US
Published: Institute of Electrical and Electronics Engineers 2011
Online Access:http://hdl.handle.net/1721.1/64641
https://orcid.org/0000-0001-6909-7208