On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated he...
Main Authors: | , , |
---|---|
Language: | en_US |
Published: |
2004
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/7205 |