Optimal Reinforcement Learning with Black Holes

We introduce the Black Hole Reinforcement Learning problem, a previously unexplored variant of reinforcement learning in which we lose all turn information and all reward from trajectories that visit a particular subset of states. We assume awareness of the trajectory loss events, making this a cens...

Full description

Bibliographic Details
Main Author:	Micali, Enrico
Other Authors:	Daskalakis, Constantinos
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/143151

_version_	1811091440755802112
author	Micali, Enrico
author2	Daskalakis, Constantinos
author_facet	Daskalakis, Constantinos Micali, Enrico
author_sort	Micali, Enrico
collection	MIT
description	We introduce the Black Hole Reinforcement Learning problem, a previously unexplored variant of reinforcement learning in which we lose all turn information and all reward from trajectories that visit a particular subset of states. We assume awareness of the trajectory loss events, making this a censored data problem but not a truncated data problem. We have elected to work in a fixed-horizon setting for easier readability of our arguments in what we hope will be a seminal paper in a new branch of “biased” data science. We begin by proposing a series of three MDP-inspired models for the Black Hole RL setting, each striking a unique compromise between the compactness of the state space and the optimality of a memory-less policy over the states. For the two models that are memory-less in both the state transition and turn reward functions, we describe a stochastic policy gradient algorithm that guarantees non-asymptotic convergence to a policy with finitely suboptimal expected trajectory reward, focusing on the more compact model which benefits from better convergence rates. This algorithm’s guarantee hinges on the assumption that any location in the problem environment has a non-zero chance of being an agent’s start state. We dedicate the remainder of the paper to the feasibility of finding optimal policies for Black Hole RL problems in which we don’t have this guarantee. We give proof that converging to an optimal policy is feasible under the new assumption that the distribution determining agents’ starting state is known. Problems in the Black Hole RL setting where the starting state distribution is “sparse” and unknown remain open problems. We provide a potential first avenue of research into their feasibility, which we believe to be the next natural frontier in Black Hole Reinforcement Learning Theory.
first_indexed	2024-09-23T15:02:28Z
format	Thesis
id	mit-1721.1/143151
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T15:02:28Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1431512022-06-16T03:12:14Z Optimal Reinforcement Learning with Black Holes Micali, Enrico Daskalakis, Constantinos Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science We introduce the Black Hole Reinforcement Learning problem, a previously unexplored variant of reinforcement learning in which we lose all turn information and all reward from trajectories that visit a particular subset of states. We assume awareness of the trajectory loss events, making this a censored data problem but not a truncated data problem. We have elected to work in a fixed-horizon setting for easier readability of our arguments in what we hope will be a seminal paper in a new branch of “biased” data science. We begin by proposing a series of three MDP-inspired models for the Black Hole RL setting, each striking a unique compromise between the compactness of the state space and the optimality of a memory-less policy over the states. For the two models that are memory-less in both the state transition and turn reward functions, we describe a stochastic policy gradient algorithm that guarantees non-asymptotic convergence to a policy with finitely suboptimal expected trajectory reward, focusing on the more compact model which benefits from better convergence rates. This algorithm’s guarantee hinges on the assumption that any location in the problem environment has a non-zero chance of being an agent’s start state. We dedicate the remainder of the paper to the feasibility of finding optimal policies for Black Hole RL problems in which we don’t have this guarantee. We give proof that converging to an optimal policy is feasible under the new assumption that the distribution determining agents’ starting state is known. Problems in the Black Hole RL setting where the starting state distribution is “sparse” and unknown remain open problems. We provide a potential first avenue of research into their feasibility, which we believe to be the next natural frontier in Black Hole Reinforcement Learning Theory. M.Eng. 2022-06-15T12:59:47Z 2022-06-15T12:59:47Z 2022-02 2022-02-22T18:32:27.231Z Thesis https://hdl.handle.net/1721.1/143151 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Micali, Enrico Optimal Reinforcement Learning with Black Holes
title	Optimal Reinforcement Learning with Black Holes
title_full	Optimal Reinforcement Learning with Black Holes
title_fullStr	Optimal Reinforcement Learning with Black Holes
title_full_unstemmed	Optimal Reinforcement Learning with Black Holes
title_short	Optimal Reinforcement Learning with Black Holes
title_sort	optimal reinforcement learning with black holes
url	https://hdl.handle.net/1721.1/143151
work_keys_str_mv	AT micalienrico optimalreinforcementlearningwithblackholes

Optimal Reinforcement Learning with Black Holes

Similar Items