Effective offline training and efficient online adaptation

<p>Developing agents that behave intelligently in the world is an open challenge in machine learning. Desiderata for such agents are efficient exploration, maximizing long term utility, and the ability to effectively leverage prior data to solve new tasks. Reinforcement learning (RL) is an app...

Full description

Bibliographic Details
Main Author:	Ball, P
Other Authors:	Roberts, S
Format:	Thesis
Language:	English
Published:	2023

_version_	1797113330345705472
author	Ball, P
author2	Roberts, S
author_facet	Roberts, S Ball, P
author_sort	Ball, P
collection	OXFORD
description	<p>Developing agents that behave intelligently in the world is an open challenge in machine learning. Desiderata for such agents are efficient exploration, maximizing long term utility, and the ability to effectively leverage prior data to solve new tasks. Reinforcement learning (RL) is an approach that is predicated on learning by directly interacting with an environment through trial-and-error, and presents a way for us to train and deploy such agents. Moreover, combining RL with powerful neural network function approximators – a sub-field known as “deep RL” – has shown evidence towards achieving this goal. For instance, deep RL has yielded agents that can play Go at superhuman levels, improve the efficiency of microchip designs, and learn complex novel strategies for controlling nuclear fusion reactions.</p> <p>A key issue that stands in the way of deploying deep RL is poor sample efficiency. Concretely, while it is possible to train effective agents using deep RL, the key successes have largely been in environments where we have access to large amounts of online interaction, often through the use of simulators. However, in many real-world problems, we are confronted with scenarios where samples are expensive to obtain. As has been alluded to, one way to alleviate this issue is through accessing some prior data, often termed “offline data”, which can accelerate how quickly we learn such agents, such as leveraging exploratory data to prevent redundant deployments, or using human-expert data to quickly guide agents towards promising behaviors and beyond. However, the best way to incorporate this data into existing deep RL algorithms is not straightforward; naïvely pre-training using RL algorithms on this offline data, a paradigm called “offline RL” as a starting point for subsequent learning is often detrimental. Moreover, it is unclear how to explicitly derive useful behaviors online that are positively influenced by this offline pre-training.</p> <p>With these factors in mind, this thesis follows a 3-pronged strategy towards improving sample-efficiency in deep RL. First, we investigate effective pre-training on offline data. Then, we tackle the online problem, looking at efficient adaptation to environments when operating purely online. Finally, we conclude with hybrid strategies that use offline data to explicitly augment policies when acting online.</p>
first_indexed	2024-04-23T08:27:06Z
format	Thesis
id	oxford-uuid:991caab8-7b21-41c4-a615-be9708f6d409
institution	University of Oxford
language	English
last_indexed	2024-04-23T08:27:06Z
publishDate	2023
record_format	dspace
spelling	oxford-uuid:991caab8-7b21-41c4-a615-be9708f6d4092024-04-22T11:54:34ZEffective offline training and efficient online adaptationThesishttp://purl.org/coar/resource_type/c_db06uuid:991caab8-7b21-41c4-a615-be9708f6d409EnglishHyrax Deposit2023Ball, PRoberts, S<p>Developing agents that behave intelligently in the world is an open challenge in machine learning. Desiderata for such agents are efficient exploration, maximizing long term utility, and the ability to effectively leverage prior data to solve new tasks. Reinforcement learning (RL) is an approach that is predicated on learning by directly interacting with an environment through trial-and-error, and presents a way for us to train and deploy such agents. Moreover, combining RL with powerful neural network function approximators – a sub-field known as “deep RL” – has shown evidence towards achieving this goal. For instance, deep RL has yielded agents that can play Go at superhuman levels, improve the efficiency of microchip designs, and learn complex novel strategies for controlling nuclear fusion reactions.</p> <p>A key issue that stands in the way of deploying deep RL is poor sample efficiency. Concretely, while it is possible to train effective agents using deep RL, the key successes have largely been in environments where we have access to large amounts of online interaction, often through the use of simulators. However, in many real-world problems, we are confronted with scenarios where samples are expensive to obtain. As has been alluded to, one way to alleviate this issue is through accessing some prior data, often termed “offline data”, which can accelerate how quickly we learn such agents, such as leveraging exploratory data to prevent redundant deployments, or using human-expert data to quickly guide agents towards promising behaviors and beyond. However, the best way to incorporate this data into existing deep RL algorithms is not straightforward; naïvely pre-training using RL algorithms on this offline data, a paradigm called “offline RL” as a starting point for subsequent learning is often detrimental. Moreover, it is unclear how to explicitly derive useful behaviors online that are positively influenced by this offline pre-training.</p> <p>With these factors in mind, this thesis follows a 3-pronged strategy towards improving sample-efficiency in deep RL. First, we investigate effective pre-training on offline data. Then, we tackle the online problem, looking at efficient adaptation to environments when operating purely online. Finally, we conclude with hybrid strategies that use offline data to explicitly augment policies when acting online.</p>
spellingShingle	Ball, P Effective offline training and efficient online adaptation
title	Effective offline training and efficient online adaptation
title_full	Effective offline training and efficient online adaptation
title_fullStr	Effective offline training and efficient online adaptation
title_full_unstemmed	Effective offline training and efficient online adaptation
title_short	Effective offline training and efficient online adaptation
title_sort	effective offline training and efficient online adaptation
work_keys_str_mv	AT ballp effectiveofflinetrainingandefficientonlineadaptation

Effective offline training and efficient online adaptation

Similar Items