Optimal Reinforcement Learning with Black Holes
We introduce the Black Hole Reinforcement Learning problem, a previously unexplored variant of reinforcement learning in which we lose all turn information and all reward from trajectories that visit a particular subset of states. We assume awareness of the trajectory loss events, making this a cens...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/143151 |
_version_ | 1811091440755802112 |
---|---|
author | Micali, Enrico |
author2 | Daskalakis, Constantinos |
author_facet | Daskalakis, Constantinos Micali, Enrico |
author_sort | Micali, Enrico |
collection | MIT |
description | We introduce the Black Hole Reinforcement Learning problem, a previously unexplored variant of reinforcement learning in which we lose all turn information and all reward from trajectories that visit a particular subset of states. We assume awareness of the trajectory loss events, making this a censored data problem but not a truncated data problem. We have elected to work in a fixed-horizon setting for easier readability of our arguments in what we hope will be a seminal paper in a new branch of “biased” data science.
We begin by proposing a series of three MDP-inspired models for the Black Hole RL setting, each striking a unique compromise between the compactness of the state space and the optimality of a memory-less policy over the states. For the two models that are memory-less in both the state transition and turn reward functions, we describe a stochastic policy gradient algorithm that guarantees non-asymptotic convergence to a policy with finitely suboptimal expected trajectory reward, focusing on the more compact model which benefits from better convergence rates.
This algorithm’s guarantee hinges on the assumption that any location in the problem environment has a non-zero chance of being an agent’s start state. We dedicate the remainder of the paper to the feasibility of finding optimal policies for Black Hole RL problems in which we don’t have this guarantee. We give proof that converging to an optimal policy is feasible under the new assumption that the distribution determining agents’ starting state is known. Problems in the Black Hole RL setting where the starting state distribution is “sparse” and unknown remain open problems. We provide a potential first avenue of research into their feasibility, which we believe to be the next natural frontier in Black Hole Reinforcement Learning Theory. |
first_indexed | 2024-09-23T15:02:28Z |
format | Thesis |
id | mit-1721.1/143151 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T15:02:28Z |
publishDate | 2022 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1431512022-06-16T03:12:14Z Optimal Reinforcement Learning with Black Holes Micali, Enrico Daskalakis, Constantinos Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science We introduce the Black Hole Reinforcement Learning problem, a previously unexplored variant of reinforcement learning in which we lose all turn information and all reward from trajectories that visit a particular subset of states. We assume awareness of the trajectory loss events, making this a censored data problem but not a truncated data problem. We have elected to work in a fixed-horizon setting for easier readability of our arguments in what we hope will be a seminal paper in a new branch of “biased” data science. We begin by proposing a series of three MDP-inspired models for the Black Hole RL setting, each striking a unique compromise between the compactness of the state space and the optimality of a memory-less policy over the states. For the two models that are memory-less in both the state transition and turn reward functions, we describe a stochastic policy gradient algorithm that guarantees non-asymptotic convergence to a policy with finitely suboptimal expected trajectory reward, focusing on the more compact model which benefits from better convergence rates. This algorithm’s guarantee hinges on the assumption that any location in the problem environment has a non-zero chance of being an agent’s start state. We dedicate the remainder of the paper to the feasibility of finding optimal policies for Black Hole RL problems in which we don’t have this guarantee. We give proof that converging to an optimal policy is feasible under the new assumption that the distribution determining agents’ starting state is known. Problems in the Black Hole RL setting where the starting state distribution is “sparse” and unknown remain open problems. We provide a potential first avenue of research into their feasibility, which we believe to be the next natural frontier in Black Hole Reinforcement Learning Theory. M.Eng. 2022-06-15T12:59:47Z 2022-06-15T12:59:47Z 2022-02 2022-02-22T18:32:27.231Z Thesis https://hdl.handle.net/1721.1/143151 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Micali, Enrico Optimal Reinforcement Learning with Black Holes |
title | Optimal Reinforcement Learning with Black Holes |
title_full | Optimal Reinforcement Learning with Black Holes |
title_fullStr | Optimal Reinforcement Learning with Black Holes |
title_full_unstemmed | Optimal Reinforcement Learning with Black Holes |
title_short | Optimal Reinforcement Learning with Black Holes |
title_sort | optimal reinforcement learning with black holes |
url | https://hdl.handle.net/1721.1/143151 |
work_keys_str_mv | AT micalienrico optimalreinforcementlearningwithblackholes |