Pathologies of Temporal Difference Methods in Approximate Dynamic Programming
Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated convergence behavior is complex, and not well understood at present. An important question is whether the policy iterati...
Main Author: | |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
Institute of Electrical and Electronics Engineers
2011
|
Online Access: | http://hdl.handle.net/1721.1/64641 https://orcid.org/0000-0001-6909-7208 |