On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated he...
Main Authors: | Jaakkola, Tommi, Jordan, Michael I., Singh, Satinder P. |
---|---|
Language: | en_US |
Published: |
2004
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/7205 |
Similar Items
-
6.231 Dynamic Programming and Stochastic Control, Fall 2011
by: Bertsekas, Dimitri
Published: (2011) -
6.231 Dynamic Programming and Stochastic Control, Fall 2002
by: Bertsekas, Dimitri P.
Published: (2002) -
6.231 Dynamic Programming and Stochastic Control, Fall 2008
by: Bertsekas, Dimitri
Published: (2008) -
Towards Feature Selection In Actor-Critic Algorithms
by: Rohanimanesh, Khashayar, et al.
Published: (2007) -
Stochastic Combinatorial Optimization with Risk
by: Nikolova, Evdokia
Published: (2008)