Information-theoretic Algorithms for Model-free Reinforcement Learning
In this work, we propose a model-free reinforcement learning algorithm for infinte-horizon, average-reward decision processes where the transition function has a finite yet unknown dependence on history, and where the induced Markov Decision Process is assumed to be weakly communicating. This algori...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2023
|
Online Access: | https://hdl.handle.net/1721.1/152649 |
_version_ | 1826202458009894912 |
---|---|
author | Wu, Farrell Eldrian S. |
author2 | Farias, Vivek F. |
author_facet | Farias, Vivek F. Wu, Farrell Eldrian S. |
author_sort | Wu, Farrell Eldrian S. |
collection | MIT |
description | In this work, we propose a model-free reinforcement learning algorithm for infinte-horizon, average-reward decision processes where the transition function has a finite yet unknown dependence on history, and where the induced Markov Decision Process is assumed to be weakly communicating. This algorithm combines the Lempel-Ziv (LZ) parsing tree structure for states introduced in [4] together with the optimistic Q-learning approach in [9]. We mathematically analyze the algorithm towards showing sublinear regret, providing major steps towards the proof of such. In doing so, we reduce the proof to showing sub-linearity of a key quantity related to the sum of an uncertainty metric at each step. Simulations of the algorithm will be done in a later work. |
first_indexed | 2024-09-23T12:07:46Z |
format | Thesis |
id | mit-1721.1/152649 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T12:07:46Z |
publishDate | 2023 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1526492023-11-03T03:36:33Z Information-theoretic Algorithms for Model-free Reinforcement Learning Wu, Farrell Eldrian S. Farias, Vivek F. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science In this work, we propose a model-free reinforcement learning algorithm for infinte-horizon, average-reward decision processes where the transition function has a finite yet unknown dependence on history, and where the induced Markov Decision Process is assumed to be weakly communicating. This algorithm combines the Lempel-Ziv (LZ) parsing tree structure for states introduced in [4] together with the optimistic Q-learning approach in [9]. We mathematically analyze the algorithm towards showing sublinear regret, providing major steps towards the proof of such. In doing so, we reduce the proof to showing sub-linearity of a key quantity related to the sum of an uncertainty metric at each step. Simulations of the algorithm will be done in a later work. M.Eng. 2023-11-02T20:05:43Z 2023-11-02T20:05:43Z 2023-09 2023-10-03T18:21:06.833Z Thesis https://hdl.handle.net/1721.1/152649 Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Wu, Farrell Eldrian S. Information-theoretic Algorithms for Model-free Reinforcement Learning |
title | Information-theoretic Algorithms for Model-free
Reinforcement Learning |
title_full | Information-theoretic Algorithms for Model-free
Reinforcement Learning |
title_fullStr | Information-theoretic Algorithms for Model-free
Reinforcement Learning |
title_full_unstemmed | Information-theoretic Algorithms for Model-free
Reinforcement Learning |
title_short | Information-theoretic Algorithms for Model-free
Reinforcement Learning |
title_sort | information theoretic algorithms for model free reinforcement learning |
url | https://hdl.handle.net/1721.1/152649 |
work_keys_str_mv | AT wufarrelleldrians informationtheoreticalgorithmsformodelfreereinforcementlearning |