Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge

The problem of sample complexity of online reinforcement learning is often studied in the literature without taking into account any partial knowledge about the system dynamics that could potentially accelerate the learning process. In this thesis, we study the sample complexity of online Q-learning...

Full description

Bibliographic Details
Main Author: Alharbi, Meshal
Other Authors: Roozbehani, Mardavij
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/155510
_version_ 1826208642948399104
author Alharbi, Meshal
author2 Roozbehani, Mardavij
author_facet Roozbehani, Mardavij
Alharbi, Meshal
author_sort Alharbi, Meshal
collection MIT
description The problem of sample complexity of online reinforcement learning is often studied in the literature without taking into account any partial knowledge about the system dynamics that could potentially accelerate the learning process. In this thesis, we study the sample complexity of online Q-learning methods when some prior knowledge about the dynamics is available or can be learned efficiently. We focus on systems that evolve according to an additive disturbance model where the underlying dynamics are described by a deterministic function of states and actions, along with an unknown additive disturbance that is independent of states and actions. In the setting of finite Markov decision processes, we present an optimistic Q-learning algorithm that achieves Õ(√T) regret without polynomial dependency on the number of states and actions under perfect knowledge of the dynamics function. This is in contrast to the typical Õ(√SAT) regret for existing Q-learning methods. Further, if only a noisy estimate of the dynamics function is available, our method can learn an approximately optimal policy in a number of samples that is independent of the cardinalities of state and action spaces. The sub-optimality gap depends on the approximation error of the noisy estimate, as well as the Lipschitz constant of the corresponding optimal value function. Our approach does not require modeling of the transition probabilities and enjoys the same memory complexity as model-free methods.
first_indexed 2024-09-23T14:08:34Z
format Thesis
id mit-1721.1/155510
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T14:08:34Z
publishDate 2024
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1555102024-07-09T03:22:40Z Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge Alharbi, Meshal Roozbehani, Mardavij Dahleh, Munther A. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Center for Computational Science and Engineering The problem of sample complexity of online reinforcement learning is often studied in the literature without taking into account any partial knowledge about the system dynamics that could potentially accelerate the learning process. In this thesis, we study the sample complexity of online Q-learning methods when some prior knowledge about the dynamics is available or can be learned efficiently. We focus on systems that evolve according to an additive disturbance model where the underlying dynamics are described by a deterministic function of states and actions, along with an unknown additive disturbance that is independent of states and actions. In the setting of finite Markov decision processes, we present an optimistic Q-learning algorithm that achieves Õ(√T) regret without polynomial dependency on the number of states and actions under perfect knowledge of the dynamics function. This is in contrast to the typical Õ(√SAT) regret for existing Q-learning methods. Further, if only a noisy estimate of the dynamics function is available, our method can learn an approximately optimal policy in a number of samples that is independent of the cardinalities of state and action spaces. The sub-optimality gap depends on the approximation error of the noisy estimate, as well as the Lipschitz constant of the corresponding optimal value function. Our approach does not require modeling of the transition probabilities and enjoys the same memory complexity as model-free methods. S.M. S.M. 2024-07-08T18:56:03Z 2024-07-08T18:56:03Z 2024-05 2024-06-06T19:54:18.706Z Thesis https://hdl.handle.net/1721.1/155510 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Alharbi, Meshal
Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge
title Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge
title_full Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge
title_fullStr Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge
title_full_unstemmed Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge
title_short Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge
title_sort sample efficient reinforcement learning with partial dynamics knowledge
url https://hdl.handle.net/1721.1/155510
work_keys_str_mv AT alharbimeshal sampleefficientreinforcementlearningwithpartialdynamicsknowledge