Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping

Reinforcement learning (RL) provides a general framework for data-driven decision making. However, the very same generality that makes this approach applicable to a wide range of problems is also responsible for its well-known inefficiencies. In this thesis, we consider different properties which ar...

Full description

Bibliographic Details
Main Author: Gehring, Clement
Other Authors: Kaelbling, Leslie Pack
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/144562
_version_ 1811069056380305408
author Gehring, Clement
author2 Kaelbling, Leslie Pack
author_facet Kaelbling, Leslie Pack
Gehring, Clement
author_sort Gehring, Clement
collection MIT
description Reinforcement learning (RL) provides a general framework for data-driven decision making. However, the very same generality that makes this approach applicable to a wide range of problems is also responsible for its well-known inefficiencies. In this thesis, we consider different properties which are shared by interesting classes of decision making which can be leveraged to design learning algorithms that are both computationally and data efficient. Specifically, this work examines the low-rank structure found in various aspects of decision making problems and the sparsity of effects of classical deterministic planning, as well as the properties that end-to-end model-based methods depend on to perform well. We start by showing how low-rank structure in the successor representation enables the design of an efficient on-line learning algorithm. Similarly, we show how this same structure can be found in the Bellman operator which we use to formulate an efficient variant of the least-squares temporal difference learning algorithm. We further explore low-rank structure in state features to learn efficient transition models which allow for efficient planning entirely in a low dimensional space. We then take a closer look at end-to-end model-based methods in to better understand their properties. We do this by examining this type of approach through the lens of constrained optimization and implicit differentiation. Through the implicit perspective, we derive properties of these methods which allow us to identify conditions under which they perform well. We conclude this thesis by exploring how the sparsity of effects of classical planning problems can used to define general domain-independent heuristics which we can be used to greatly accelerate learning of domain-dependent heuristics through the use of potential-based reward shaping and lifted function approximation.
first_indexed 2024-09-23T08:05:00Z
format Thesis
id mit-1721.1/144562
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T08:05:00Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1445622022-08-30T04:07:52Z Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping Gehring, Clement Kaelbling, Leslie Pack Lozano-Pérez, Tomás Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Reinforcement learning (RL) provides a general framework for data-driven decision making. However, the very same generality that makes this approach applicable to a wide range of problems is also responsible for its well-known inefficiencies. In this thesis, we consider different properties which are shared by interesting classes of decision making which can be leveraged to design learning algorithms that are both computationally and data efficient. Specifically, this work examines the low-rank structure found in various aspects of decision making problems and the sparsity of effects of classical deterministic planning, as well as the properties that end-to-end model-based methods depend on to perform well. We start by showing how low-rank structure in the successor representation enables the design of an efficient on-line learning algorithm. Similarly, we show how this same structure can be found in the Bellman operator which we use to formulate an efficient variant of the least-squares temporal difference learning algorithm. We further explore low-rank structure in state features to learn efficient transition models which allow for efficient planning entirely in a low dimensional space. We then take a closer look at end-to-end model-based methods in to better understand their properties. We do this by examining this type of approach through the lens of constrained optimization and implicit differentiation. Through the implicit perspective, we derive properties of these methods which allow us to identify conditions under which they perform well. We conclude this thesis by exploring how the sparsity of effects of classical planning problems can used to define general domain-independent heuristics which we can be used to greatly accelerate learning of domain-dependent heuristics through the use of potential-based reward shaping and lifted function approximation. Ph.D. 2022-08-29T15:56:03Z 2022-08-29T15:56:03Z 2022-05 2022-06-21T19:15:50.565Z Thesis https://hdl.handle.net/1721.1/144562 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Gehring, Clement
Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping
title Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping
title_full Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping
title_fullStr Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping
title_full_unstemmed Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping
title_short Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping
title_sort efficient reinforcement learning via singular value decomposition end to end model based methods and reward shaping
url https://hdl.handle.net/1721.1/144562
work_keys_str_mv AT gehringclement efficientreinforcementlearningviasingularvaluedecompositionendtoendmodelbasedmethodsandrewardshaping