On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated he...

全面介绍

书目详细资料
Main Authors: Jaakkola, Tommi, Jordan, Michael I., Singh, Satinder P.
语言:en_US
出版: 2004
主题:
在线阅读:http://hdl.handle.net/1721.1/7205

相似书籍