Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge

The problem of sample complexity of online reinforcement learning is often studied in the literature without taking into account any partial knowledge about the system dynamics that could potentially accelerate the learning process. In this thesis, we study the sample complexity of online Q-learning...

Full description

Bibliographic Details
Main Author:	Alharbi, Meshal
Other Authors:	Roozbehani, Mardavij
Format:	Thesis
Published:	Massachusetts Institute of Technology 2024
Online Access:	https://hdl.handle.net/1721.1/155510

_version_	1826208642948399104
author	Alharbi, Meshal
author2	Roozbehani, Mardavij
author_facet	Roozbehani, Mardavij Alharbi, Meshal
author_sort	Alharbi, Meshal
collection	MIT
description	The problem of sample complexity of online reinforcement learning is often studied in the literature without taking into account any partial knowledge about the system dynamics that could potentially accelerate the learning process. In this thesis, we study the sample complexity of online Q-learning methods when some prior knowledge about the dynamics is available or can be learned efficiently. We focus on systems that evolve according to an additive disturbance model where the underlying dynamics are described by a deterministic function of states and actions, along with an unknown additive disturbance that is independent of states and actions. In the setting of finite Markov decision processes, we present an optimistic Q-learning algorithm that achieves Õ(√T) regret without polynomial dependency on the number of states and actions under perfect knowledge of the dynamics function. This is in contrast to the typical Õ(√SAT) regret for existing Q-learning methods. Further, if only a noisy estimate of the dynamics function is available, our method can learn an approximately optimal policy in a number of samples that is independent of the cardinalities of state and action spaces. The sub-optimality gap depends on the approximation error of the noisy estimate, as well as the Lipschitz constant of the corresponding optimal value function. Our approach does not require modeling of the transition probabilities and enjoys the same memory complexity as model-free methods.
first_indexed	2024-09-23T14:08:34Z
format	Thesis
id	mit-1721.1/155510
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T14:08:34Z
publishDate	2024
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1555102024-07-09T03:22:40Z Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge Alharbi, Meshal Roozbehani, Mardavij Dahleh, Munther A. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Center for Computational Science and Engineering The problem of sample complexity of online reinforcement learning is often studied in the literature without taking into account any partial knowledge about the system dynamics that could potentially accelerate the learning process. In this thesis, we study the sample complexity of online Q-learning methods when some prior knowledge about the dynamics is available or can be learned efficiently. We focus on systems that evolve according to an additive disturbance model where the underlying dynamics are described by a deterministic function of states and actions, along with an unknown additive disturbance that is independent of states and actions. In the setting of finite Markov decision processes, we present an optimistic Q-learning algorithm that achieves Õ(√T) regret without polynomial dependency on the number of states and actions under perfect knowledge of the dynamics function. This is in contrast to the typical Õ(√SAT) regret for existing Q-learning methods. Further, if only a noisy estimate of the dynamics function is available, our method can learn an approximately optimal policy in a number of samples that is independent of the cardinalities of state and action spaces. The sub-optimality gap depends on the approximation error of the noisy estimate, as well as the Lipschitz constant of the corresponding optimal value function. Our approach does not require modeling of the transition probabilities and enjoys the same memory complexity as model-free methods. S.M. S.M. 2024-07-08T18:56:03Z 2024-07-08T18:56:03Z 2024-05 2024-06-06T19:54:18.706Z Thesis https://hdl.handle.net/1721.1/155510 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Alharbi, Meshal Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge
title	Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge
title_full	Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge
title_fullStr	Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge
title_full_unstemmed	Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge
title_short	Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge
title_sort	sample efficient reinforcement learning with partial dynamics knowledge
url	https://hdl.handle.net/1721.1/155510
work_keys_str_mv	AT alharbimeshal sampleefficientreinforcementlearningwithpartialdynamicsknowledge

Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge

Similar Items