PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Latent Factor Representation

Offline reinforcement learning, where a policy is learned from a fixed dataset of trajectories without further interaction with the environment, is one of the greatest challenges in reinforcement learning. Despite its compelling application to large, real-world datasets, existing RL benchmarks have...

Full description

Bibliographic Details
Main Author:	Yang, Cindy X.
Other Authors:	Shah, Devavrat
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/139130

_version_	1826207418929905664
author	Yang, Cindy X.
author2	Shah, Devavrat
author_facet	Shah, Devavrat Yang, Cindy X.
author_sort	Yang, Cindy X.
collection	MIT
description	Offline reinforcement learning, where a policy is learned from a fixed dataset of trajectories without further interaction with the environment, is one of the greatest challenges in reinforcement learning. Despite its compelling application to large, real-world datasets, existing RL benchmarks have struggled to perform well in the offline setting. In this thesis, we consider offline RL with heterogeneous agents (i.e. varying state dynamics) under severe data scarcity where only one historical trajectory per agent is observed. Under these conditions, we find that the performance of stateof-the-art offline and model-based RL methods degrade significantly. To tackle this problem, we present PerSim, a method to learn a personalized simulator for each agent leveraging historical data across all agents, prior to learning a policy. We achieve this by positing that the transition dynamics across agents are a latent function of latent factors associated with agents, actions, and units. Subsequently, we theoretically establish that this function is well-approximated by a “low-rank” decomposition of separable agent, state, and action latent functions. This representation suggests a simple, regularized neural network architecture to effectively learn the transition dynamics per agent, even with scarce, offline data. In extensive experiments performed on RL methods and popular benchmark environments from OpenAI Gym and Mujoco, we show that PerSim consistently achieves improved performance, as measured by average reward and prediction error.
first_indexed	2024-09-23T13:49:22Z
format	Thesis
id	mit-1721.1/139130
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T13:49:22Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1391302022-01-15T03:25:07Z PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Latent Factor Representation Yang, Cindy X. Shah, Devavrat Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Offline reinforcement learning, where a policy is learned from a fixed dataset of trajectories without further interaction with the environment, is one of the greatest challenges in reinforcement learning. Despite its compelling application to large, real-world datasets, existing RL benchmarks have struggled to perform well in the offline setting. In this thesis, we consider offline RL with heterogeneous agents (i.e. varying state dynamics) under severe data scarcity where only one historical trajectory per agent is observed. Under these conditions, we find that the performance of stateof-the-art offline and model-based RL methods degrade significantly. To tackle this problem, we present PerSim, a method to learn a personalized simulator for each agent leveraging historical data across all agents, prior to learning a policy. We achieve this by positing that the transition dynamics across agents are a latent function of latent factors associated with agents, actions, and units. Subsequently, we theoretically establish that this function is well-approximated by a “low-rank” decomposition of separable agent, state, and action latent functions. This representation suggests a simple, regularized neural network architecture to effectively learn the transition dynamics per agent, even with scarce, offline data. In extensive experiments performed on RL methods and popular benchmark environments from OpenAI Gym and Mujoco, we show that PerSim consistently achieves improved performance, as measured by average reward and prediction error. M.Eng. 2022-01-14T14:51:44Z 2022-01-14T14:51:44Z 2021-06 2021-06-17T20:15:01.317Z Thesis https://hdl.handle.net/1721.1/139130 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Yang, Cindy X. PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Latent Factor Representation
title	PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Latent Factor Representation
title_full	PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Latent Factor Representation
title_fullStr	PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Latent Factor Representation
title_full_unstemmed	PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Latent Factor Representation
title_short	PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Latent Factor Representation
title_sort	persim data efficient offline reinforcement learning with heterogeneous agents via latent factor representation
url	https://hdl.handle.net/1721.1/139130
work_keys_str_mv	AT yangcindyx persimdataefficientofflinereinforcementlearningwithheterogeneousagentsvialatentfactorrepresentation

PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Latent Factor Representation

Similar Items