Optimal Reinforcement Learning with Black Holes

We introduce the Black Hole Reinforcement Learning problem, a previously unexplored variant of reinforcement learning in which we lose all turn information and all reward from trajectories that visit a particular subset of states. We assume awareness of the trajectory loss events, making this a cens...

Full description

Bibliographic Details
Main Author: Micali, Enrico
Other Authors: Daskalakis, Constantinos
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/143151
_version_ 1811091440755802112
author Micali, Enrico
author2 Daskalakis, Constantinos
author_facet Daskalakis, Constantinos
Micali, Enrico
author_sort Micali, Enrico
collection MIT
description We introduce the Black Hole Reinforcement Learning problem, a previously unexplored variant of reinforcement learning in which we lose all turn information and all reward from trajectories that visit a particular subset of states. We assume awareness of the trajectory loss events, making this a censored data problem but not a truncated data problem. We have elected to work in a fixed-horizon setting for easier readability of our arguments in what we hope will be a seminal paper in a new branch of “biased” data science. We begin by proposing a series of three MDP-inspired models for the Black Hole RL setting, each striking a unique compromise between the compactness of the state space and the optimality of a memory-less policy over the states. For the two models that are memory-less in both the state transition and turn reward functions, we describe a stochastic policy gradient algorithm that guarantees non-asymptotic convergence to a policy with finitely suboptimal expected trajectory reward, focusing on the more compact model which benefits from better convergence rates. This algorithm’s guarantee hinges on the assumption that any location in the problem environment has a non-zero chance of being an agent’s start state. We dedicate the remainder of the paper to the feasibility of finding optimal policies for Black Hole RL problems in which we don’t have this guarantee. We give proof that converging to an optimal policy is feasible under the new assumption that the distribution determining agents’ starting state is known. Problems in the Black Hole RL setting where the starting state distribution is “sparse” and unknown remain open problems. We provide a potential first avenue of research into their feasibility, which we believe to be the next natural frontier in Black Hole Reinforcement Learning Theory.
first_indexed 2024-09-23T15:02:28Z
format Thesis
id mit-1721.1/143151
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T15:02:28Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1431512022-06-16T03:12:14Z Optimal Reinforcement Learning with Black Holes Micali, Enrico Daskalakis, Constantinos Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science We introduce the Black Hole Reinforcement Learning problem, a previously unexplored variant of reinforcement learning in which we lose all turn information and all reward from trajectories that visit a particular subset of states. We assume awareness of the trajectory loss events, making this a censored data problem but not a truncated data problem. We have elected to work in a fixed-horizon setting for easier readability of our arguments in what we hope will be a seminal paper in a new branch of “biased” data science. We begin by proposing a series of three MDP-inspired models for the Black Hole RL setting, each striking a unique compromise between the compactness of the state space and the optimality of a memory-less policy over the states. For the two models that are memory-less in both the state transition and turn reward functions, we describe a stochastic policy gradient algorithm that guarantees non-asymptotic convergence to a policy with finitely suboptimal expected trajectory reward, focusing on the more compact model which benefits from better convergence rates. This algorithm’s guarantee hinges on the assumption that any location in the problem environment has a non-zero chance of being an agent’s start state. We dedicate the remainder of the paper to the feasibility of finding optimal policies for Black Hole RL problems in which we don’t have this guarantee. We give proof that converging to an optimal policy is feasible under the new assumption that the distribution determining agents’ starting state is known. Problems in the Black Hole RL setting where the starting state distribution is “sparse” and unknown remain open problems. We provide a potential first avenue of research into their feasibility, which we believe to be the next natural frontier in Black Hole Reinforcement Learning Theory. M.Eng. 2022-06-15T12:59:47Z 2022-06-15T12:59:47Z 2022-02 2022-02-22T18:32:27.231Z Thesis https://hdl.handle.net/1721.1/143151 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Micali, Enrico
Optimal Reinforcement Learning with Black Holes
title Optimal Reinforcement Learning with Black Holes
title_full Optimal Reinforcement Learning with Black Holes
title_fullStr Optimal Reinforcement Learning with Black Holes
title_full_unstemmed Optimal Reinforcement Learning with Black Holes
title_short Optimal Reinforcement Learning with Black Holes
title_sort optimal reinforcement learning with black holes
url https://hdl.handle.net/1721.1/143151
work_keys_str_mv AT micalienrico optimalreinforcementlearningwithblackholes