Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping

Reinforcement learning (RL) provides a general framework for data-driven decision making. However, the very same generality that makes this approach applicable to a wide range of problems is also responsible for its well-known inefficiencies. In this thesis, we consider different properties which ar...

Full description

Bibliographic Details
Main Author:	Gehring, Clement
Other Authors:	Kaelbling, Leslie Pack
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/144562

_version_	1811069056380305408
author	Gehring, Clement
author2	Kaelbling, Leslie Pack
author_facet	Kaelbling, Leslie Pack Gehring, Clement
author_sort	Gehring, Clement
collection	MIT
description	Reinforcement learning (RL) provides a general framework for data-driven decision making. However, the very same generality that makes this approach applicable to a wide range of problems is also responsible for its well-known inefficiencies. In this thesis, we consider different properties which are shared by interesting classes of decision making which can be leveraged to design learning algorithms that are both computationally and data efficient. Specifically, this work examines the low-rank structure found in various aspects of decision making problems and the sparsity of effects of classical deterministic planning, as well as the properties that end-to-end model-based methods depend on to perform well. We start by showing how low-rank structure in the successor representation enables the design of an efficient on-line learning algorithm. Similarly, we show how this same structure can be found in the Bellman operator which we use to formulate an efficient variant of the least-squares temporal difference learning algorithm. We further explore low-rank structure in state features to learn efficient transition models which allow for efficient planning entirely in a low dimensional space. We then take a closer look at end-to-end model-based methods in to better understand their properties. We do this by examining this type of approach through the lens of constrained optimization and implicit differentiation. Through the implicit perspective, we derive properties of these methods which allow us to identify conditions under which they perform well. We conclude this thesis by exploring how the sparsity of effects of classical planning problems can used to define general domain-independent heuristics which we can be used to greatly accelerate learning of domain-dependent heuristics through the use of potential-based reward shaping and lifted function approximation.
first_indexed	2024-09-23T08:05:00Z
format	Thesis
id	mit-1721.1/144562
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T08:05:00Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1445622022-08-30T04:07:52Z Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping Gehring, Clement Kaelbling, Leslie Pack Lozano-Pérez, Tomás Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Reinforcement learning (RL) provides a general framework for data-driven decision making. However, the very same generality that makes this approach applicable to a wide range of problems is also responsible for its well-known inefficiencies. In this thesis, we consider different properties which are shared by interesting classes of decision making which can be leveraged to design learning algorithms that are both computationally and data efficient. Specifically, this work examines the low-rank structure found in various aspects of decision making problems and the sparsity of effects of classical deterministic planning, as well as the properties that end-to-end model-based methods depend on to perform well. We start by showing how low-rank structure in the successor representation enables the design of an efficient on-line learning algorithm. Similarly, we show how this same structure can be found in the Bellman operator which we use to formulate an efficient variant of the least-squares temporal difference learning algorithm. We further explore low-rank structure in state features to learn efficient transition models which allow for efficient planning entirely in a low dimensional space. We then take a closer look at end-to-end model-based methods in to better understand their properties. We do this by examining this type of approach through the lens of constrained optimization and implicit differentiation. Through the implicit perspective, we derive properties of these methods which allow us to identify conditions under which they perform well. We conclude this thesis by exploring how the sparsity of effects of classical planning problems can used to define general domain-independent heuristics which we can be used to greatly accelerate learning of domain-dependent heuristics through the use of potential-based reward shaping and lifted function approximation. Ph.D. 2022-08-29T15:56:03Z 2022-08-29T15:56:03Z 2022-05 2022-06-21T19:15:50.565Z Thesis https://hdl.handle.net/1721.1/144562 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Gehring, Clement Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping
title	Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping
title_full	Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping
title_fullStr	Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping
title_full_unstemmed	Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping
title_short	Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping
title_sort	efficient reinforcement learning via singular value decomposition end to end model based methods and reward shaping
url	https://hdl.handle.net/1721.1/144562
work_keys_str_mv	AT gehringclement efficientreinforcementlearningviasingularvaluedecompositionendtoendmodelbasedmethodsandrewardshaping

Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping

Similar Items