On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated he...
Main Authors: | , , |
---|---|
Language: | en_US |
Published: |
2004
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/7205 |
_version_ | 1826199681532690432 |
---|---|
author | Jaakkola, Tommi Jordan, Michael I. Singh, Satinder P. |
author_facet | Jaakkola, Tommi Jordan, Michael I. Singh, Satinder P. |
author_sort | Jaakkola, Tommi |
collection | MIT |
description | Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong. |
first_indexed | 2024-09-23T11:24:16Z |
id | mit-1721.1/7205 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T11:24:16Z |
publishDate | 2004 |
record_format | dspace |
spelling | mit-1721.1/72052019-04-10T11:52:45Z On the Convergence of Stochastic Iterative Dynamic Programming Algorithms Jaakkola, Tommi Jordan, Michael I. Singh, Satinder P. reinforcement learning stochastic approximation sconvergence dynamic programming Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong. 2004-10-20T20:49:46Z 2004-10-20T20:49:46Z 1993-08-01 AIM-1441 CBCL-084 http://hdl.handle.net/1721.1/7205 en_US AIM-1441 CBCL-084 15 p. 77605 bytes 356324 bytes application/octet-stream application/pdf application/octet-stream application/pdf |
spellingShingle | reinforcement learning stochastic approximation sconvergence dynamic programming Jaakkola, Tommi Jordan, Michael I. Singh, Satinder P. On the Convergence of Stochastic Iterative Dynamic Programming Algorithms |
title | On the Convergence of Stochastic Iterative Dynamic Programming Algorithms |
title_full | On the Convergence of Stochastic Iterative Dynamic Programming Algorithms |
title_fullStr | On the Convergence of Stochastic Iterative Dynamic Programming Algorithms |
title_full_unstemmed | On the Convergence of Stochastic Iterative Dynamic Programming Algorithms |
title_short | On the Convergence of Stochastic Iterative Dynamic Programming Algorithms |
title_sort | on the convergence of stochastic iterative dynamic programming algorithms |
topic | reinforcement learning stochastic approximation sconvergence dynamic programming |
url | http://hdl.handle.net/1721.1/7205 |
work_keys_str_mv | AT jaakkolatommi ontheconvergenceofstochasticiterativedynamicprogrammingalgorithms AT jordanmichaeli ontheconvergenceofstochasticiterativedynamicprogrammingalgorithms AT singhsatinderp ontheconvergenceofstochasticiterativedynamicprogrammingalgorithms |