Least Squares Temporal Difference Methods: An Analysis under General Conditions

We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) with the least squares temporal difference (LSTD) algorithm, LSTD($\lambda$), in an exploration-enhanced learning context, where policy costs are computed from observations of a Markov chain differe...

Full description

Bibliographic Details
Main Author:	Yu, Huizhen
Other Authors:	Massachusetts Institute of Technology. Laboratory for Information and Decision Systems
Format:	Article
Language:	en_US
Published:	Society for Industrial and Applied Mathematics 2013
Online Access:	http://hdl.handle.net/1721.1/77629

_version_	1811075183358771200
author	Yu, Huizhen
author2	Massachusetts Institute of Technology. Laboratory for Information and Decision Systems
author_facet	Massachusetts Institute of Technology. Laboratory for Information and Decision Systems Yu, Huizhen
author_sort	Yu, Huizhen
collection	MIT
description	We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) with the least squares temporal difference (LSTD) algorithm, LSTD($\lambda$), in an exploration-enhanced learning context, where policy costs are computed from observations of a Markov chain different from the one corresponding to the policy under evaluation. We establish for the discounted cost criterion that LSTD($\lambda$) converges almost surely under mild, minimal conditions. We also analyze other properties of the iterates involved in the algorithm, including convergence in mean and boundedness. Our analysis draws on theories of both finite space Markov chains and weak Feller Markov chains on a topological space. Our results can be applied to other temporal difference algorithms and MDP models. As examples, we give a convergence analysis of a TD($\lambda$) algorithm and extensions to MDP with compact state and action spaces, as well as a convergence proof of a new LSTD algorithm with state-dependent $\lambda$-parameters.
first_indexed	2024-09-23T10:02:09Z
format	Article
id	mit-1721.1/77629
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T10:02:09Z
publishDate	2013
publisher	Society for Industrial and Applied Mathematics
record_format	dspace
spelling	mit-1721.1/776292022-09-30T18:25:21Z Least Squares Temporal Difference Methods: An Analysis under General Conditions Yu, Huizhen Massachusetts Institute of Technology. Laboratory for Information and Decision Systems Yu, Huizhen We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) with the least squares temporal difference (LSTD) algorithm, LSTD($\lambda$), in an exploration-enhanced learning context, where policy costs are computed from observations of a Markov chain different from the one corresponding to the policy under evaluation. We establish for the discounted cost criterion that LSTD($\lambda$) converges almost surely under mild, minimal conditions. We also analyze other properties of the iterates involved in the algorithm, including convergence in mean and boundedness. Our analysis draws on theories of both finite space Markov chains and weak Feller Markov chains on a topological space. Our results can be applied to other temporal difference algorithms and MDP models. As examples, we give a convergence analysis of a TD($\lambda$) algorithm and extensions to MDP with compact state and action spaces, as well as a convergence proof of a new LSTD algorithm with state-dependent $\lambda$-parameters. 2013-03-12T18:09:37Z 2013-03-12T18:09:37Z 2012-12 2010-09 Article http://purl.org/eprint/type/JournalArticle 0363-0129 1095-7138 http://hdl.handle.net/1721.1/77629 Yu, Huizhen. “Least Squares Temporal Difference Methods: An Analysis Under General Conditions.” SIAM Journal on Control and Optimization 50.6 (2012): 3310–3343. © 2012, Society for Industrial and Applied Mathematics en_US http://dx.doi.org/10.1137/100807879 SIAM Journal on Control and Optimization Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Society for Industrial and Applied Mathematics SIAM
spellingShingle	Yu, Huizhen Least Squares Temporal Difference Methods: An Analysis under General Conditions
title	Least Squares Temporal Difference Methods: An Analysis under General Conditions
title_full	Least Squares Temporal Difference Methods: An Analysis under General Conditions
title_fullStr	Least Squares Temporal Difference Methods: An Analysis under General Conditions
title_full_unstemmed	Least Squares Temporal Difference Methods: An Analysis under General Conditions
title_short	Least Squares Temporal Difference Methods: An Analysis under General Conditions
title_sort	least squares temporal difference methods an analysis under general conditions
url	http://hdl.handle.net/1721.1/77629
work_keys_str_mv	AT yuhuizhen leastsquarestemporaldifferencemethodsananalysisundergeneralconditions

Least Squares Temporal Difference Methods: An Analysis under General Conditions

Similar Items