Risk-sensitive and robust model-based reinforcement learning and planning

<p>Many sequential decision-making problems that are currently automated, such as those in manufacturing or recommender systems, operate in an environment where there is either little uncertainty, or zero risk of catastrophe. As companies and researchers attempt to deploy autonomous systems in...

Full description

Bibliographic Details
Main Author:	Rigter, M
Other Authors:	Hawes, N
Format:	Thesis
Language:	English
Published:	2022
Subjects:	Artificial intelligence Machine learning Robotics Operations research

_version_	1826316387516153856
author	Rigter, M
author2	Hawes, N
author_facet	Hawes, N Rigter, M
author_sort	Rigter, M
collection	OXFORD
description	<p>Many sequential decision-making problems that are currently automated, such as those in manufacturing or recommender systems, operate in an environment where there is either little uncertainty, or zero risk of catastrophe. As companies and researchers attempt to deploy autonomous systems in less constrained environments, it is increasingly important that we endow sequential decision-making algorithms with the ability to reason about uncertainty and risk.</p> <p>In this thesis, we will address both planning and reinforcement learning (RL) approaches to sequential decision-making. In the planning setting, it is assumed that a model of the environment is provided, and a policy is optimised within that model. Reinforcement learning relies upon extensive random exploration, and therefore usually requires a simulator in which to perform training. In many real-world domains, it is impossible to construct a perfectly accurate model or simulator. Therefore, the performance of any policy is inevitably uncertain due to the incomplete knowledge about the environment. Furthermore, in stochastic domains, the outcome of any given run is also uncertain due to the inherent randomness of the environment. These two sources of uncertainty are usually classified as epistemic, and aleatoric uncertainty, respectively. The over-arching goal of this thesis is to contribute to developing algorithms that mitigate both sources of uncertainty in sequential decision-making problems.</p> <p>We make a number of contributions towards this goal, with a focus on model- based algorithms. We begin by considering the simplest case where the Markov decision process (MDP) is fully known, and propose a method for optimising a risk-averse objective while optimising the expected value as a secondary objective. For the remainder of the thesis, we no longer assume that the MDP is fully specified. We consider several different representations of the uncertainty over the MDP, including a) an uncertainty set of candidate MDPs, b) a prior distribution over MDPs, and c) a fixed dataset of interactions with the MDP. In setting a), we propose a new approach to approximate the minimax regret objective, and find a single policy with low sub-optimality across all candidate MDPs. In b), we propose to optimise for risk- aversion in Bayes-adaptive MDPs to avert risks due to both epistemic and aleatoric uncertainty under a single framework. In c), the offline RL setting, we propose two algorithms to overcome the uncertainty that stems from only having access to a fixed dataset. The first proposes a scalable algorithm to solve a robust MDP formulation of offline RL, and the second approach is based on risk-sensitive optimisation. In the final contribution chapter, we consider an interactive formulation of learning from demonstration. In this problem, it is necessary to reason about uncertainty over the performance of the current policy, to selectively choose when to request demonstrations. Empirically, we demonstrate that the algorithms we propose can generate risk-sensitive or robust behaviour in a number of different domains.</p>
first_indexed	2024-03-07T07:39:05Z
format	Thesis
id	oxford-uuid:3ce77fc0-36ff-4292-bffe-747ffaa0e1b5
institution	University of Oxford
language	English
last_indexed	2024-12-09T03:44:15Z
publishDate	2022
record_format	dspace
spelling	oxford-uuid:3ce77fc0-36ff-4292-bffe-747ffaa0e1b52024-12-07T15:37:40ZRisk-sensitive and robust model-based reinforcement learning and planningThesishttp://purl.org/coar/resource_type/c_db06uuid:3ce77fc0-36ff-4292-bffe-747ffaa0e1b5Artificial intelligenceMachine learningRoboticsOperations researchEnglishHyrax Deposit2022Rigter, MHawes, NLacerda, B<p>Many sequential decision-making problems that are currently automated, such as those in manufacturing or recommender systems, operate in an environment where there is either little uncertainty, or zero risk of catastrophe. As companies and researchers attempt to deploy autonomous systems in less constrained environments, it is increasingly important that we endow sequential decision-making algorithms with the ability to reason about uncertainty and risk.</p> <p>In this thesis, we will address both planning and reinforcement learning (RL) approaches to sequential decision-making. In the planning setting, it is assumed that a model of the environment is provided, and a policy is optimised within that model. Reinforcement learning relies upon extensive random exploration, and therefore usually requires a simulator in which to perform training. In many real-world domains, it is impossible to construct a perfectly accurate model or simulator. Therefore, the performance of any policy is inevitably uncertain due to the incomplete knowledge about the environment. Furthermore, in stochastic domains, the outcome of any given run is also uncertain due to the inherent randomness of the environment. These two sources of uncertainty are usually classified as epistemic, and aleatoric uncertainty, respectively. The over-arching goal of this thesis is to contribute to developing algorithms that mitigate both sources of uncertainty in sequential decision-making problems.</p> <p>We make a number of contributions towards this goal, with a focus on model- based algorithms. We begin by considering the simplest case where the Markov decision process (MDP) is fully known, and propose a method for optimising a risk-averse objective while optimising the expected value as a secondary objective. For the remainder of the thesis, we no longer assume that the MDP is fully specified. We consider several different representations of the uncertainty over the MDP, including a) an uncertainty set of candidate MDPs, b) a prior distribution over MDPs, and c) a fixed dataset of interactions with the MDP. In setting a), we propose a new approach to approximate the minimax regret objective, and find a single policy with low sub-optimality across all candidate MDPs. In b), we propose to optimise for risk- aversion in Bayes-adaptive MDPs to avert risks due to both epistemic and aleatoric uncertainty under a single framework. In c), the offline RL setting, we propose two algorithms to overcome the uncertainty that stems from only having access to a fixed dataset. The first proposes a scalable algorithm to solve a robust MDP formulation of offline RL, and the second approach is based on risk-sensitive optimisation. In the final contribution chapter, we consider an interactive formulation of learning from demonstration. In this problem, it is necessary to reason about uncertainty over the performance of the current policy, to selectively choose when to request demonstrations. Empirically, we demonstrate that the algorithms we propose can generate risk-sensitive or robust behaviour in a number of different domains.</p>
spellingShingle	Artificial intelligence Machine learning Robotics Operations research Rigter, M Risk-sensitive and robust model-based reinforcement learning and planning
title	Risk-sensitive and robust model-based reinforcement learning and planning
title_full	Risk-sensitive and robust model-based reinforcement learning and planning
title_fullStr	Risk-sensitive and robust model-based reinforcement learning and planning
title_full_unstemmed	Risk-sensitive and robust model-based reinforcement learning and planning
title_short	Risk-sensitive and robust model-based reinforcement learning and planning
title_sort	risk sensitive and robust model based reinforcement learning and planning
topic	Artificial intelligence Machine learning Robotics Operations research
work_keys_str_mv	AT rigterm risksensitiveandrobustmodelbasedreinforcementlearningandplanning

Risk-sensitive and robust model-based reinforcement learning and planning

Similar Items