Information-theoretic Algorithms for Model-free Reinforcement Learning

In this work, we propose a model-free reinforcement learning algorithm for infinte-horizon, average-reward decision processes where the transition function has a finite yet unknown dependence on history, and where the induced Markov Decision Process is assumed to be weakly communicating. This algori...

Full description

Bibliographic Details
Main Author:	Wu, Farrell Eldrian S.
Other Authors:	Farias, Vivek F.
Format:	Thesis
Published:	Massachusetts Institute of Technology 2023
Online Access:	https://hdl.handle.net/1721.1/152649

_version_	1826202458009894912
author	Wu, Farrell Eldrian S.
author2	Farias, Vivek F.
author_facet	Farias, Vivek F. Wu, Farrell Eldrian S.
author_sort	Wu, Farrell Eldrian S.
collection	MIT
description	In this work, we propose a model-free reinforcement learning algorithm for infinte-horizon, average-reward decision processes where the transition function has a finite yet unknown dependence on history, and where the induced Markov Decision Process is assumed to be weakly communicating. This algorithm combines the Lempel-Ziv (LZ) parsing tree structure for states introduced in [4] together with the optimistic Q-learning approach in [9]. We mathematically analyze the algorithm towards showing sublinear regret, providing major steps towards the proof of such. In doing so, we reduce the proof to showing sub-linearity of a key quantity related to the sum of an uncertainty metric at each step. Simulations of the algorithm will be done in a later work.
first_indexed	2024-09-23T12:07:46Z
format	Thesis
id	mit-1721.1/152649
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T12:07:46Z
publishDate	2023
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1526492023-11-03T03:36:33Z Information-theoretic Algorithms for Model-free Reinforcement Learning Wu, Farrell Eldrian S. Farias, Vivek F. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science In this work, we propose a model-free reinforcement learning algorithm for infinte-horizon, average-reward decision processes where the transition function has a finite yet unknown dependence on history, and where the induced Markov Decision Process is assumed to be weakly communicating. This algorithm combines the Lempel-Ziv (LZ) parsing tree structure for states introduced in [4] together with the optimistic Q-learning approach in [9]. We mathematically analyze the algorithm towards showing sublinear regret, providing major steps towards the proof of such. In doing so, we reduce the proof to showing sub-linearity of a key quantity related to the sum of an uncertainty metric at each step. Simulations of the algorithm will be done in a later work. M.Eng. 2023-11-02T20:05:43Z 2023-11-02T20:05:43Z 2023-09 2023-10-03T18:21:06.833Z Thesis https://hdl.handle.net/1721.1/152649 Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Wu, Farrell Eldrian S. Information-theoretic Algorithms for Model-free Reinforcement Learning
title	Information-theoretic Algorithms for Model-free Reinforcement Learning
title_full	Information-theoretic Algorithms for Model-free Reinforcement Learning
title_fullStr	Information-theoretic Algorithms for Model-free Reinforcement Learning
title_full_unstemmed	Information-theoretic Algorithms for Model-free Reinforcement Learning
title_short	Information-theoretic Algorithms for Model-free Reinforcement Learning
title_sort	information theoretic algorithms for model free reinforcement learning
url	https://hdl.handle.net/1721.1/152649
work_keys_str_mv	AT wufarrelleldrians informationtheoreticalgorithmsformodelfreereinforcementlearning

Information-theoretic Algorithms for Model-free Reinforcement Learning

Similar Items