Effective offline training and efficient online adaptation
<p>Developing agents that behave intelligently in the world is an open challenge in machine learning. Desiderata for such agents are efficient exploration, maximizing long term utility, and the ability to effectively leverage prior data to solve new tasks. Reinforcement learning (RL) is an app...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | English |
Published: |
2023
|
_version_ | 1797113330345705472 |
---|---|
author | Ball, P |
author2 | Roberts, S |
author_facet | Roberts, S Ball, P |
author_sort | Ball, P |
collection | OXFORD |
description | <p>Developing agents that behave intelligently in the world is an open challenge in
machine learning. Desiderata for such agents are efficient exploration, maximizing
long term utility, and the ability to effectively leverage prior data to solve new
tasks. Reinforcement learning (RL) is an approach that is predicated on learning
by directly interacting with an environment through trial-and-error, and presents
a way for us to train and deploy such agents. Moreover, combining RL with
powerful neural network function approximators – a sub-field known as “deep RL” –
has shown evidence towards achieving this goal. For instance, deep RL has yielded
agents that can play Go at superhuman levels, improve the efficiency of microchip
designs, and learn complex novel strategies for controlling nuclear fusion reactions.</p>
<p>A key issue that stands in the way of deploying deep RL is poor sample efficiency. Concretely, while it is possible to train effective agents using deep
RL, the key successes have largely been in environments where we have access to
large amounts of online interaction, often through the use of simulators. However,
in many real-world problems, we are confronted with scenarios where samples
are expensive to obtain. As has been alluded to, one way to alleviate this issue
is through accessing some prior data, often termed “offline data”, which can
accelerate how quickly we learn such agents, such as leveraging exploratory data
to prevent redundant deployments, or using human-expert data to quickly guide
agents towards promising behaviors and beyond. However, the best way to
incorporate this data into existing deep RL algorithms is not straightforward;
naïvely pre-training using RL algorithms on this offline data, a paradigm called
“offline RL” as a starting point for subsequent learning is often detrimental.
Moreover, it is unclear how to explicitly derive useful behaviors online that are
positively influenced by this offline pre-training.</p>
<p>With these factors in mind, this thesis follows a 3-pronged strategy towards
improving sample-efficiency in deep RL. First, we investigate effective pre-training
on offline data. Then, we tackle the online problem, looking at efficient adaptation
to environments when operating purely online. Finally, we conclude with hybrid
strategies that use offline data to explicitly augment policies when acting online.</p> |
first_indexed | 2024-04-23T08:27:06Z |
format | Thesis |
id | oxford-uuid:991caab8-7b21-41c4-a615-be9708f6d409 |
institution | University of Oxford |
language | English |
last_indexed | 2024-04-23T08:27:06Z |
publishDate | 2023 |
record_format | dspace |
spelling | oxford-uuid:991caab8-7b21-41c4-a615-be9708f6d4092024-04-22T11:54:34ZEffective offline training and efficient online adaptationThesishttp://purl.org/coar/resource_type/c_db06uuid:991caab8-7b21-41c4-a615-be9708f6d409EnglishHyrax Deposit2023Ball, PRoberts, S<p>Developing agents that behave intelligently in the world is an open challenge in machine learning. Desiderata for such agents are efficient exploration, maximizing long term utility, and the ability to effectively leverage prior data to solve new tasks. Reinforcement learning (RL) is an approach that is predicated on learning by directly interacting with an environment through trial-and-error, and presents a way for us to train and deploy such agents. Moreover, combining RL with powerful neural network function approximators – a sub-field known as “deep RL” – has shown evidence towards achieving this goal. For instance, deep RL has yielded agents that can play Go at superhuman levels, improve the efficiency of microchip designs, and learn complex novel strategies for controlling nuclear fusion reactions.</p> <p>A key issue that stands in the way of deploying deep RL is poor sample efficiency. Concretely, while it is possible to train effective agents using deep RL, the key successes have largely been in environments where we have access to large amounts of online interaction, often through the use of simulators. However, in many real-world problems, we are confronted with scenarios where samples are expensive to obtain. As has been alluded to, one way to alleviate this issue is through accessing some prior data, often termed “offline data”, which can accelerate how quickly we learn such agents, such as leveraging exploratory data to prevent redundant deployments, or using human-expert data to quickly guide agents towards promising behaviors and beyond. However, the best way to incorporate this data into existing deep RL algorithms is not straightforward; naïvely pre-training using RL algorithms on this offline data, a paradigm called “offline RL” as a starting point for subsequent learning is often detrimental. Moreover, it is unclear how to explicitly derive useful behaviors online that are positively influenced by this offline pre-training.</p> <p>With these factors in mind, this thesis follows a 3-pronged strategy towards improving sample-efficiency in deep RL. First, we investigate effective pre-training on offline data. Then, we tackle the online problem, looking at efficient adaptation to environments when operating purely online. Finally, we conclude with hybrid strategies that use offline data to explicitly augment policies when acting online.</p> |
spellingShingle | Ball, P Effective offline training and efficient online adaptation |
title | Effective offline training and efficient online adaptation |
title_full | Effective offline training and efficient online adaptation |
title_fullStr | Effective offline training and efficient online adaptation |
title_full_unstemmed | Effective offline training and efficient online adaptation |
title_short | Effective offline training and efficient online adaptation |
title_sort | effective offline training and efficient online adaptation |
work_keys_str_mv | AT ballp effectiveofflinetrainingandefficientonlineadaptation |