Fast adaptation via meta reinforcement learning

<p>Reinforcement Learning (RL) is a way to train artificial agents to autonomously interact with the world. In practice however, RL still has limitations that prohibit the deployment of RL agents in many real world settings. This is because RL takes long, typically requires human oversight, an...

Full description

Bibliographic Details
Main Author: Zintgraf, L
Other Authors: Whiteson, S
Format: Thesis
Language:English
Published: 2022
Subjects:
Description
Summary:<p>Reinforcement Learning (RL) is a way to train artificial agents to autonomously interact with the world. In practice however, RL still has limitations that prohibit the deployment of RL agents in many real world settings. This is because RL takes long, typically requires human oversight, and produces specialised agents that can behave unexpected in unfamiliar situations. This thesis is motivated by the goal of making RL agents more flexible, robust, and safe to deploy in the real world. We develop agents capable of Fast Adaptation, i.e., agents that can learn new tasks efficiently.</p> <p>To this end, we use Meta Reinforcement Learning (Meta-RL), where we teach agents not only to act autonomously, but to learn autonomously. We propose four novel Meta-RL methods based on the intuition that adapting fast can be divided into "task inference" (understanding the task) and "task solving" (solving the task). We hypothesise that this split can simplify optimisation and thus improve performance, and is more amenable to downstream tasks. To implement this, we propose a context-based approach, where the agent conditions on a context that represents its current knowledge about the task. The agent can then use this to decide whether to learn more about the task, or try and solve it.</p> <p>In Chapter 5, we use a deterministic context and establish that this can indeed improve performance and adequately captures the task. In the subsequent chapters, we then introduce Bayesian reasoning over the context, to enable decision-making under task uncertainty. By combining Meta-RL, context-based learning, and approximate variational inference, we develop methods to compute approximately Bayes-optimal agents for single-agent settings (Chapter 6) and multi-agent settings (Chapter 7). Finally, Chapter 8 addresses the challenge of meta-learning with sparse rewards, which is an important setting for many real-world applications. We observe that existing Meta-RL methods can fail entirely if rewards are sparse, and propose a way to overcome this by encouraging the agent to explore during meta-training. We conclude the thesis with a reflection on the work presented in the context of current developments, and a discussion of open questions.</p> <p>In summary, the contributions in this thesis significantly advance the field of Fast Adaptation via Meta-RL. The agents develop in this thesis can adapt faster than any previous methods across a variety of tasks, and we can compute approximately Bayes-optimal policies for much more complex task distributions than previously possible. We hope that this helps drive forward Meta-RL research and, in the long term, using RL to address important real world challenges.</p>