Inductive biases and generalisation for deep reinforcement learning

<p>In this thesis we aim to improve generalisation in deep reinforcement learning. Generalisation is a fundamental challenge for any type of learning, determining how acquired knowledge can be transferred to new, previously unseen situations. We focus on reinforcement learning, a framework des...

Полное описание

Библиографические подробности
Главный автор:	Igl, M
Другие авторы:	Whiteson, S
Формат:	Диссертация
Язык:	English
Опубликовано:	2021
Предметы:	Reinforcement learning Machine learning

_version_	1826288108889440256
author	Igl, M
author2	Whiteson, S
author_facet	Whiteson, S Igl, M
author_sort	Igl, M
collection	OXFORD
description	<p>In this thesis we aim to improve generalisation in deep reinforcement learning. Generalisation is a fundamental challenge for any type of learning, determining how acquired knowledge can be transferred to new, previously unseen situations. We focus on reinforcement learning, a framework describing how artificial agents can learn to interact with their environment to achieve goals. In recent years, by using neural networks to represent agents, it has achieved remarkable success and vastly expanded its scope of possible applications. Our goal is to improve the performance of these agents by allowing them to learn faster, to learn better solutions and to react robustly to previously unseen situations. On this quest, we explore a range of different methods and approaches.</p> <p>We focus on incorporating additional structures, also called inductive biases, into the agent. Focussing on specific, yet widely applicable problem domains, we can develop specialised architectures which greatly improve performance. In Chapter 3 we focus on partially observable environments in which the agent is prevented full access to all task-relevant information at every moment in time. In Chapter 4 we turn our attention to multi-task and transfer learning and devise a novel training method allowing us to train hierarchically structured agents. Our method optimises for re-usability of individual solutions, greatly enhancing performance in transfer settings.</p> <p>In the second part of this thesis, we turn our attention towards regularisation, another form of inductive bias, as a means to improve generalisation of deep agents. In Chapter 5 we first explore stochastic regularisation in reinforcement learning (rl). While these techniques have proven highly effective in supervised learning, we highlight and overcome difficulties in applying them directly to online rl algorithms, one of the most powerful and widely used types of learning in rl. In Chapter 6 we investigate generalisation in deep rl on a more fundamental level by exploring how transient non-stationarity in the training data can interfere with the stochastic gradient training of neural networks and can bias them towards worse solutions. Many state of the art rl algorithms introduce these types of non-stationarity into the training, even in stationary environments, by using a continuously improving policy for data collection. We propose a novel framework to reduce the non-stationarity experienced by the trained policy, thereby allowing for improved generalisation.</p>
first_indexed	2024-03-07T02:08:45Z
format	Thesis
id	oxford-uuid:9fdfadb0-e527-4421-9a22-8466c9fed9c8
institution	University of Oxford
language	English
last_indexed	2024-03-07T02:08:45Z
publishDate	2021
record_format	dspace
spelling	oxford-uuid:9fdfadb0-e527-4421-9a22-8466c9fed9c82022-03-27T02:01:21ZInductive biases and generalisation for deep reinforcement learningThesishttp://purl.org/coar/resource_type/c_db06uuid:9fdfadb0-e527-4421-9a22-8466c9fed9c8Reinforcement learningMachine learningEnglishHyrax Deposit2021Igl, MWhiteson, SAbate, AWhite, M<p>In this thesis we aim to improve generalisation in deep reinforcement learning. Generalisation is a fundamental challenge for any type of learning, determining how acquired knowledge can be transferred to new, previously unseen situations. We focus on reinforcement learning, a framework describing how artificial agents can learn to interact with their environment to achieve goals. In recent years, by using neural networks to represent agents, it has achieved remarkable success and vastly expanded its scope of possible applications. Our goal is to improve the performance of these agents by allowing them to learn faster, to learn better solutions and to react robustly to previously unseen situations. On this quest, we explore a range of different methods and approaches.</p> <p>We focus on incorporating additional structures, also called inductive biases, into the agent. Focussing on specific, yet widely applicable problem domains, we can develop specialised architectures which greatly improve performance. In Chapter 3 we focus on partially observable environments in which the agent is prevented full access to all task-relevant information at every moment in time. In Chapter 4 we turn our attention to multi-task and transfer learning and devise a novel training method allowing us to train hierarchically structured agents. Our method optimises for re-usability of individual solutions, greatly enhancing performance in transfer settings.</p> <p>In the second part of this thesis, we turn our attention towards regularisation, another form of inductive bias, as a means to improve generalisation of deep agents. In Chapter 5 we first explore stochastic regularisation in reinforcement learning (rl). While these techniques have proven highly effective in supervised learning, we highlight and overcome difficulties in applying them directly to online rl algorithms, one of the most powerful and widely used types of learning in rl. In Chapter 6 we investigate generalisation in deep rl on a more fundamental level by exploring how transient non-stationarity in the training data can interfere with the stochastic gradient training of neural networks and can bias them towards worse solutions. Many state of the art rl algorithms introduce these types of non-stationarity into the training, even in stationary environments, by using a continuously improving policy for data collection. We propose a novel framework to reduce the non-stationarity experienced by the trained policy, thereby allowing for improved generalisation.</p>
spellingShingle	Reinforcement learning Machine learning Igl, M Inductive biases and generalisation for deep reinforcement learning
title	Inductive biases and generalisation for deep reinforcement learning
title_full	Inductive biases and generalisation for deep reinforcement learning
title_fullStr	Inductive biases and generalisation for deep reinforcement learning
title_full_unstemmed	Inductive biases and generalisation for deep reinforcement learning
title_short	Inductive biases and generalisation for deep reinforcement learning
title_sort	inductive biases and generalisation for deep reinforcement learning
topic	Reinforcement learning Machine learning
work_keys_str_mv	AT iglm inductivebiasesandgeneralisationfordeepreinforcementlearning

Inductive biases and generalisation for deep reinforcement learning

Схожие документы