On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated he...
Main Authors: | Jaakkola, Tommi, Jordan, Michael I., Singh, Satinder P. |
---|---|
语言: | en_US |
出版: |
2004
|
主题: | |
在线阅读: | http://hdl.handle.net/1721.1/7205 |
相似书籍
-
A non-iterative distributed approximate dynamic programming algorithm for frequency security-constrained stochastic economic dispatch
由: Xiangyong Feng, et al.
出版: (2025-05-01) -
Stochastic approximation and its applications /
由: 575513 Chen, Hanfu
出版: (2002) -
Approximation and weak convergence methods for random processes, with applications to stochastic systems theory /
由: 272176 Kushner, Harold J. (Harold Joseph), 1933-
出版: (1984) -
Stochastic dynamic programming and the control of queueing systems /
由: Sennott, Linn I., 1943-
出版: (1999) -
Generalized bounds for convex multistage stochastic programs /
由: Kuhn, Daniel, 1975-
出版: (2005)