Effective offline training and efficient online adaptation

<p>Developing agents that behave intelligently in the world is an open challenge in machine learning. Desiderata for such agents are efficient exploration, maximizing long term utility, and the ability to effectively leverage prior data to solve new tasks. Reinforcement learning (RL) is an app...

Full description

Bibliographic Details
Main Author: Ball, P
Other Authors: Roberts, S
Format: Thesis
Language:English
Published: 2023
_version_ 1797113330345705472
author Ball, P
author2 Roberts, S
author_facet Roberts, S
Ball, P
author_sort Ball, P
collection OXFORD
description <p>Developing agents that behave intelligently in the world is an open challenge in machine learning. Desiderata for such agents are efficient exploration, maximizing long term utility, and the ability to effectively leverage prior data to solve new tasks. Reinforcement learning (RL) is an approach that is predicated on learning by directly interacting with an environment through trial-and-error, and presents a way for us to train and deploy such agents. Moreover, combining RL with powerful neural network function approximators – a sub-field known as “deep RL” – has shown evidence towards achieving this goal. For instance, deep RL has yielded agents that can play Go at superhuman levels, improve the efficiency of microchip designs, and learn complex novel strategies for controlling nuclear fusion reactions.</p> <p>A key issue that stands in the way of deploying deep RL is poor sample efficiency. Concretely, while it is possible to train effective agents using deep RL, the key successes have largely been in environments where we have access to large amounts of online interaction, often through the use of simulators. However, in many real-world problems, we are confronted with scenarios where samples are expensive to obtain. As has been alluded to, one way to alleviate this issue is through accessing some prior data, often termed “offline data”, which can accelerate how quickly we learn such agents, such as leveraging exploratory data to prevent redundant deployments, or using human-expert data to quickly guide agents towards promising behaviors and beyond. However, the best way to incorporate this data into existing deep RL algorithms is not straightforward; naïvely pre-training using RL algorithms on this offline data, a paradigm called “offline RL” as a starting point for subsequent learning is often detrimental. Moreover, it is unclear how to explicitly derive useful behaviors online that are positively influenced by this offline pre-training.</p> <p>With these factors in mind, this thesis follows a 3-pronged strategy towards improving sample-efficiency in deep RL. First, we investigate effective pre-training on offline data. Then, we tackle the online problem, looking at efficient adaptation to environments when operating purely online. Finally, we conclude with hybrid strategies that use offline data to explicitly augment policies when acting online.</p>
first_indexed 2024-04-23T08:27:06Z
format Thesis
id oxford-uuid:991caab8-7b21-41c4-a615-be9708f6d409
institution University of Oxford
language English
last_indexed 2024-04-23T08:27:06Z
publishDate 2023
record_format dspace
spelling oxford-uuid:991caab8-7b21-41c4-a615-be9708f6d4092024-04-22T11:54:34ZEffective offline training and efficient online adaptationThesishttp://purl.org/coar/resource_type/c_db06uuid:991caab8-7b21-41c4-a615-be9708f6d409EnglishHyrax Deposit2023Ball, PRoberts, S<p>Developing agents that behave intelligently in the world is an open challenge in machine learning. Desiderata for such agents are efficient exploration, maximizing long term utility, and the ability to effectively leverage prior data to solve new tasks. Reinforcement learning (RL) is an approach that is predicated on learning by directly interacting with an environment through trial-and-error, and presents a way for us to train and deploy such agents. Moreover, combining RL with powerful neural network function approximators – a sub-field known as “deep RL” – has shown evidence towards achieving this goal. For instance, deep RL has yielded agents that can play Go at superhuman levels, improve the efficiency of microchip designs, and learn complex novel strategies for controlling nuclear fusion reactions.</p> <p>A key issue that stands in the way of deploying deep RL is poor sample efficiency. Concretely, while it is possible to train effective agents using deep RL, the key successes have largely been in environments where we have access to large amounts of online interaction, often through the use of simulators. However, in many real-world problems, we are confronted with scenarios where samples are expensive to obtain. As has been alluded to, one way to alleviate this issue is through accessing some prior data, often termed “offline data”, which can accelerate how quickly we learn such agents, such as leveraging exploratory data to prevent redundant deployments, or using human-expert data to quickly guide agents towards promising behaviors and beyond. However, the best way to incorporate this data into existing deep RL algorithms is not straightforward; naïvely pre-training using RL algorithms on this offline data, a paradigm called “offline RL” as a starting point for subsequent learning is often detrimental. Moreover, it is unclear how to explicitly derive useful behaviors online that are positively influenced by this offline pre-training.</p> <p>With these factors in mind, this thesis follows a 3-pronged strategy towards improving sample-efficiency in deep RL. First, we investigate effective pre-training on offline data. Then, we tackle the online problem, looking at efficient adaptation to environments when operating purely online. Finally, we conclude with hybrid strategies that use offline data to explicitly augment policies when acting online.</p>
spellingShingle Ball, P
Effective offline training and efficient online adaptation
title Effective offline training and efficient online adaptation
title_full Effective offline training and efficient online adaptation
title_fullStr Effective offline training and efficient online adaptation
title_full_unstemmed Effective offline training and efficient online adaptation
title_short Effective offline training and efficient online adaptation
title_sort effective offline training and efficient online adaptation
work_keys_str_mv AT ballp effectiveofflinetrainingandefficientonlineadaptation