Inductive biases and generalisation for deep reinforcement learning
<p>In this thesis we aim to improve generalisation in deep reinforcement learning. Generalisation is a fundamental challenge for any type of learning, determining how acquired knowledge can be transferred to new, previously unseen situations. We focus on reinforcement learning, a framework des...
Главный автор: | |
---|---|
Другие авторы: | |
Формат: | Диссертация |
Язык: | English |
Опубликовано: |
2021
|
Предметы: |
_version_ | 1826288108889440256 |
---|---|
author | Igl, M |
author2 | Whiteson, S |
author_facet | Whiteson, S Igl, M |
author_sort | Igl, M |
collection | OXFORD |
description | <p>In this thesis we aim to improve generalisation in deep reinforcement learning. Generalisation is a fundamental challenge for any type of learning, determining how acquired knowledge can be transferred to new, previously unseen situations. We focus on reinforcement learning, a framework describing how artificial agents can learn to interact with their environment to achieve goals. In recent years, by using neural networks to represent agents, it has achieved remarkable success and vastly expanded its scope of possible applications. Our goal is to improve the performance of these agents by allowing them to learn faster, to learn better solutions and to react robustly to previously unseen situations. On this quest, we explore a range of different methods and approaches.</p>
<p>We focus on incorporating additional structures, also called inductive biases, into the agent. Focussing on specific, yet widely applicable problem domains, we can develop specialised architectures which greatly improve performance. In Chapter 3 we focus on partially observable environments in which the agent is prevented full access to all task-relevant information at every moment in time. In Chapter 4 we turn our attention to multi-task and transfer learning and devise a novel training method allowing us to train hierarchically structured agents. Our method optimises for re-usability of individual solutions, greatly enhancing performance in transfer settings.</p>
<p>In the second part of this thesis, we turn our attention towards regularisation, another form of inductive bias, as a means to improve generalisation of deep agents. In Chapter 5 we first explore stochastic regularisation in reinforcement learning (rl). While these techniques have proven highly effective in supervised learning, we highlight and overcome difficulties in applying them directly to online rl algorithms, one of the most powerful and widely used types of learning in rl. In Chapter 6 we investigate generalisation in deep rl on a more fundamental level by exploring how transient non-stationarity in the training data can interfere with the stochastic gradient training of neural networks and can bias them towards worse solutions. Many state of the art rl algorithms introduce these types of non-stationarity into the training, even in stationary environments, by using a continuously improving policy for data collection. We propose a novel framework to reduce the non-stationarity experienced by the trained policy, thereby allowing for improved generalisation.</p>
|
first_indexed | 2024-03-07T02:08:45Z |
format | Thesis |
id | oxford-uuid:9fdfadb0-e527-4421-9a22-8466c9fed9c8 |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T02:08:45Z |
publishDate | 2021 |
record_format | dspace |
spelling | oxford-uuid:9fdfadb0-e527-4421-9a22-8466c9fed9c82022-03-27T02:01:21ZInductive biases and generalisation for deep reinforcement learningThesishttp://purl.org/coar/resource_type/c_db06uuid:9fdfadb0-e527-4421-9a22-8466c9fed9c8Reinforcement learningMachine learningEnglishHyrax Deposit2021Igl, MWhiteson, SAbate, AWhite, M<p>In this thesis we aim to improve generalisation in deep reinforcement learning. Generalisation is a fundamental challenge for any type of learning, determining how acquired knowledge can be transferred to new, previously unseen situations. We focus on reinforcement learning, a framework describing how artificial agents can learn to interact with their environment to achieve goals. In recent years, by using neural networks to represent agents, it has achieved remarkable success and vastly expanded its scope of possible applications. Our goal is to improve the performance of these agents by allowing them to learn faster, to learn better solutions and to react robustly to previously unseen situations. On this quest, we explore a range of different methods and approaches.</p> <p>We focus on incorporating additional structures, also called inductive biases, into the agent. Focussing on specific, yet widely applicable problem domains, we can develop specialised architectures which greatly improve performance. In Chapter 3 we focus on partially observable environments in which the agent is prevented full access to all task-relevant information at every moment in time. In Chapter 4 we turn our attention to multi-task and transfer learning and devise a novel training method allowing us to train hierarchically structured agents. Our method optimises for re-usability of individual solutions, greatly enhancing performance in transfer settings.</p> <p>In the second part of this thesis, we turn our attention towards regularisation, another form of inductive bias, as a means to improve generalisation of deep agents. In Chapter 5 we first explore stochastic regularisation in reinforcement learning (rl). While these techniques have proven highly effective in supervised learning, we highlight and overcome difficulties in applying them directly to online rl algorithms, one of the most powerful and widely used types of learning in rl. In Chapter 6 we investigate generalisation in deep rl on a more fundamental level by exploring how transient non-stationarity in the training data can interfere with the stochastic gradient training of neural networks and can bias them towards worse solutions. Many state of the art rl algorithms introduce these types of non-stationarity into the training, even in stationary environments, by using a continuously improving policy for data collection. We propose a novel framework to reduce the non-stationarity experienced by the trained policy, thereby allowing for improved generalisation.</p> |
spellingShingle | Reinforcement learning Machine learning Igl, M Inductive biases and generalisation for deep reinforcement learning |
title | Inductive biases and generalisation for deep reinforcement learning |
title_full | Inductive biases and generalisation for deep reinforcement learning |
title_fullStr | Inductive biases and generalisation for deep reinforcement learning |
title_full_unstemmed | Inductive biases and generalisation for deep reinforcement learning |
title_short | Inductive biases and generalisation for deep reinforcement learning |
title_sort | inductive biases and generalisation for deep reinforcement learning |
topic | Reinforcement learning Machine learning |
work_keys_str_mv | AT iglm inductivebiasesandgeneralisationfordeepreinforcementlearning |