On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated he...

Descripció completa

Dades bibliogràfiques
Autors principals:	Jaakkola, Tommi, Jordan, Michael I., Singh, Satinder P.
Idioma:	en_US
Publicat:	2004
Matèries:	reinforcement learning stochastic approximation sconvergence dynamic programming
Accés en línia:	http://hdl.handle.net/1721.1/7205

_version_	1826199681532690432
author	Jaakkola, Tommi Jordan, Michael I. Singh, Satinder P.
author_facet	Jaakkola, Tommi Jordan, Michael I. Singh, Satinder P.
author_sort	Jaakkola, Tommi
collection	MIT
description	Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.
first_indexed	2024-09-23T11:24:16Z
id	mit-1721.1/7205
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T11:24:16Z
publishDate	2004
record_format	dspace
spelling	mit-1721.1/72052019-04-10T11:52:45Z On the Convergence of Stochastic Iterative Dynamic Programming Algorithms Jaakkola, Tommi Jordan, Michael I. Singh, Satinder P. reinforcement learning stochastic approximation sconvergence dynamic programming Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong. 2004-10-20T20:49:46Z 2004-10-20T20:49:46Z 1993-08-01 AIM-1441 CBCL-084 http://hdl.handle.net/1721.1/7205 en_US AIM-1441 CBCL-084 15 p. 77605 bytes 356324 bytes application/octet-stream application/pdf application/octet-stream application/pdf
spellingShingle	reinforcement learning stochastic approximation sconvergence dynamic programming Jaakkola, Tommi Jordan, Michael I. Singh, Satinder P. On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
title	On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
title_full	On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
title_fullStr	On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
title_full_unstemmed	On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
title_short	On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
title_sort	on the convergence of stochastic iterative dynamic programming algorithms
topic	reinforcement learning stochastic approximation sconvergence dynamic programming
url	http://hdl.handle.net/1721.1/7205
work_keys_str_mv	AT jaakkolatommi ontheconvergenceofstochasticiterativedynamicprogrammingalgorithms AT jordanmichaeli ontheconvergenceofstochasticiterativedynamicprogrammingalgorithms AT singhsatinderp ontheconvergenceofstochasticiterativedynamicprogrammingalgorithms

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Ítems similars