Inductive biases and generalisation for deep reinforcement learning

<p>In this thesis we aim to improve generalisation in deep reinforcement learning. Generalisation is a fundamental challenge for any type of learning, determining how acquired knowledge can be transferred to new, previously unseen situations. We focus on reinforcement learning, a framework des...

Full description

Bibliographic Details
Main Author: Igl, M
Other Authors: Whiteson, S
Format: Thesis
Language:English
Published: 2021
Subjects:
_version_ 1826288108889440256
author Igl, M
author2 Whiteson, S
author_facet Whiteson, S
Igl, M
author_sort Igl, M
collection OXFORD
description <p>In this thesis we aim to improve generalisation in deep reinforcement learning. Generalisation is a fundamental challenge for any type of learning, determining how acquired knowledge can be transferred to new, previously unseen situations. We focus on reinforcement learning, a framework describing how artificial agents can learn to interact with their environment to achieve goals. In recent years, by using neural networks to represent agents, it has achieved remarkable success and vastly expanded its scope of possible applications. Our goal is to improve the performance of these agents by allowing them to learn faster, to learn better solutions and to react robustly to previously unseen situations. On this quest, we explore a range of different methods and approaches.</p> <p>We focus on incorporating additional structures, also called inductive biases, into the agent. Focussing on specific, yet widely applicable problem domains, we can develop specialised architectures which greatly improve performance. In Chapter 3 we focus on partially observable environments in which the agent is prevented full access to all task-relevant information at every moment in time. In Chapter 4 we turn our attention to multi-task and transfer learning and devise a novel training method allowing us to train hierarchically structured agents. Our method optimises for re-usability of individual solutions, greatly enhancing performance in transfer settings.</p> <p>In the second part of this thesis, we turn our attention towards regularisation, another form of inductive bias, as a means to improve generalisation of deep agents. In Chapter 5 we first explore stochastic regularisation in reinforcement learning (rl). While these techniques have proven highly effective in supervised learning, we highlight and overcome difficulties in applying them directly to online rl algorithms, one of the most powerful and widely used types of learning in rl. In Chapter 6 we investigate generalisation in deep rl on a more fundamental level by exploring how transient non-stationarity in the training data can interfere with the stochastic gradient training of neural networks and can bias them towards worse solutions. Many state of the art rl algorithms introduce these types of non-stationarity into the training, even in stationary environments, by using a continuously improving policy for data collection. We propose a novel framework to reduce the non-stationarity experienced by the trained policy, thereby allowing for improved generalisation.</p>
first_indexed 2024-03-07T02:08:45Z
format Thesis
id oxford-uuid:9fdfadb0-e527-4421-9a22-8466c9fed9c8
institution University of Oxford
language English
last_indexed 2024-03-07T02:08:45Z
publishDate 2021
record_format dspace
spelling oxford-uuid:9fdfadb0-e527-4421-9a22-8466c9fed9c82022-03-27T02:01:21ZInductive biases and generalisation for deep reinforcement learningThesishttp://purl.org/coar/resource_type/c_db06uuid:9fdfadb0-e527-4421-9a22-8466c9fed9c8Reinforcement learningMachine learningEnglishHyrax Deposit2021Igl, MWhiteson, SAbate, AWhite, M<p>In this thesis we aim to improve generalisation in deep reinforcement learning. Generalisation is a fundamental challenge for any type of learning, determining how acquired knowledge can be transferred to new, previously unseen situations. We focus on reinforcement learning, a framework describing how artificial agents can learn to interact with their environment to achieve goals. In recent years, by using neural networks to represent agents, it has achieved remarkable success and vastly expanded its scope of possible applications. Our goal is to improve the performance of these agents by allowing them to learn faster, to learn better solutions and to react robustly to previously unseen situations. On this quest, we explore a range of different methods and approaches.</p> <p>We focus on incorporating additional structures, also called inductive biases, into the agent. Focussing on specific, yet widely applicable problem domains, we can develop specialised architectures which greatly improve performance. In Chapter 3 we focus on partially observable environments in which the agent is prevented full access to all task-relevant information at every moment in time. In Chapter 4 we turn our attention to multi-task and transfer learning and devise a novel training method allowing us to train hierarchically structured agents. Our method optimises for re-usability of individual solutions, greatly enhancing performance in transfer settings.</p> <p>In the second part of this thesis, we turn our attention towards regularisation, another form of inductive bias, as a means to improve generalisation of deep agents. In Chapter 5 we first explore stochastic regularisation in reinforcement learning (rl). While these techniques have proven highly effective in supervised learning, we highlight and overcome difficulties in applying them directly to online rl algorithms, one of the most powerful and widely used types of learning in rl. In Chapter 6 we investigate generalisation in deep rl on a more fundamental level by exploring how transient non-stationarity in the training data can interfere with the stochastic gradient training of neural networks and can bias them towards worse solutions. Many state of the art rl algorithms introduce these types of non-stationarity into the training, even in stationary environments, by using a continuously improving policy for data collection. We propose a novel framework to reduce the non-stationarity experienced by the trained policy, thereby allowing for improved generalisation.</p>
spellingShingle Reinforcement learning
Machine learning
Igl, M
Inductive biases and generalisation for deep reinforcement learning
title Inductive biases and generalisation for deep reinforcement learning
title_full Inductive biases and generalisation for deep reinforcement learning
title_fullStr Inductive biases and generalisation for deep reinforcement learning
title_full_unstemmed Inductive biases and generalisation for deep reinforcement learning
title_short Inductive biases and generalisation for deep reinforcement learning
title_sort inductive biases and generalisation for deep reinforcement learning
topic Reinforcement learning
Machine learning
work_keys_str_mv AT iglm inductivebiasesandgeneralisationfordeepreinforcementlearning