Risk-sensitive and robust model-based reinforcement learning and planning

<p>Many sequential decision-making problems that are currently automated, such as those in manufacturing or recommender systems, operate in an environment where there is either little uncertainty, or zero risk of catastrophe. As companies and researchers attempt to deploy autonomous systems in...

Full description

Bibliographic Details
Main Author: Rigter, M
Other Authors: Hawes, N
Format: Thesis
Language:English
Published: 2022
Subjects:
_version_ 1826316387516153856
author Rigter, M
author2 Hawes, N
author_facet Hawes, N
Rigter, M
author_sort Rigter, M
collection OXFORD
description <p>Many sequential decision-making problems that are currently automated, such as those in manufacturing or recommender systems, operate in an environment where there is either little uncertainty, or zero risk of catastrophe. As companies and researchers attempt to deploy autonomous systems in less constrained environments, it is increasingly important that we endow sequential decision-making algorithms with the ability to reason about uncertainty and risk.</p> <p>In this thesis, we will address both planning and reinforcement learning (RL) approaches to sequential decision-making. In the planning setting, it is assumed that a model of the environment is provided, and a policy is optimised within that model. Reinforcement learning relies upon extensive random exploration, and therefore usually requires a simulator in which to perform training. In many real-world domains, it is impossible to construct a perfectly accurate model or simulator. Therefore, the performance of any policy is inevitably uncertain due to the incomplete knowledge about the environment. Furthermore, in stochastic domains, the outcome of any given run is also uncertain due to the inherent randomness of the environment. These two sources of uncertainty are usually classified as epistemic, and aleatoric uncertainty, respectively. The over-arching goal of this thesis is to contribute to developing algorithms that mitigate both sources of uncertainty in sequential decision-making problems.</p> <p>We make a number of contributions towards this goal, with a focus on model- based algorithms. We begin by considering the simplest case where the Markov decision process (MDP) is fully known, and propose a method for optimising a risk-averse objective while optimising the expected value as a secondary objective. For the remainder of the thesis, we no longer assume that the MDP is fully specified. We consider several different representations of the uncertainty over the MDP, including a) an uncertainty set of candidate MDPs, b) a prior distribution over MDPs, and c) a fixed dataset of interactions with the MDP. In setting a), we propose a new approach to approximate the minimax regret objective, and find a single policy with low sub-optimality across all candidate MDPs. In b), we propose to optimise for risk- aversion in Bayes-adaptive MDPs to avert risks due to both epistemic and aleatoric uncertainty under a single framework. In c), the offline RL setting, we propose two algorithms to overcome the uncertainty that stems from only having access to a fixed dataset. The first proposes a scalable algorithm to solve a robust MDP formulation of offline RL, and the second approach is based on risk-sensitive optimisation. In the final contribution chapter, we consider an interactive formulation of learning from demonstration. In this problem, it is necessary to reason about uncertainty over the performance of the current policy, to selectively choose when to request demonstrations. Empirically, we demonstrate that the algorithms we propose can generate risk-sensitive or robust behaviour in a number of different domains.</p>
first_indexed 2024-03-07T07:39:05Z
format Thesis
id oxford-uuid:3ce77fc0-36ff-4292-bffe-747ffaa0e1b5
institution University of Oxford
language English
last_indexed 2024-12-09T03:44:15Z
publishDate 2022
record_format dspace
spelling oxford-uuid:3ce77fc0-36ff-4292-bffe-747ffaa0e1b52024-12-07T15:37:40ZRisk-sensitive and robust model-based reinforcement learning and planningThesishttp://purl.org/coar/resource_type/c_db06uuid:3ce77fc0-36ff-4292-bffe-747ffaa0e1b5Artificial intelligenceMachine learningRoboticsOperations researchEnglishHyrax Deposit2022Rigter, MHawes, NLacerda, B<p>Many sequential decision-making problems that are currently automated, such as those in manufacturing or recommender systems, operate in an environment where there is either little uncertainty, or zero risk of catastrophe. As companies and researchers attempt to deploy autonomous systems in less constrained environments, it is increasingly important that we endow sequential decision-making algorithms with the ability to reason about uncertainty and risk.</p> <p>In this thesis, we will address both planning and reinforcement learning (RL) approaches to sequential decision-making. In the planning setting, it is assumed that a model of the environment is provided, and a policy is optimised within that model. Reinforcement learning relies upon extensive random exploration, and therefore usually requires a simulator in which to perform training. In many real-world domains, it is impossible to construct a perfectly accurate model or simulator. Therefore, the performance of any policy is inevitably uncertain due to the incomplete knowledge about the environment. Furthermore, in stochastic domains, the outcome of any given run is also uncertain due to the inherent randomness of the environment. These two sources of uncertainty are usually classified as epistemic, and aleatoric uncertainty, respectively. The over-arching goal of this thesis is to contribute to developing algorithms that mitigate both sources of uncertainty in sequential decision-making problems.</p> <p>We make a number of contributions towards this goal, with a focus on model- based algorithms. We begin by considering the simplest case where the Markov decision process (MDP) is fully known, and propose a method for optimising a risk-averse objective while optimising the expected value as a secondary objective. For the remainder of the thesis, we no longer assume that the MDP is fully specified. We consider several different representations of the uncertainty over the MDP, including a) an uncertainty set of candidate MDPs, b) a prior distribution over MDPs, and c) a fixed dataset of interactions with the MDP. In setting a), we propose a new approach to approximate the minimax regret objective, and find a single policy with low sub-optimality across all candidate MDPs. In b), we propose to optimise for risk- aversion in Bayes-adaptive MDPs to avert risks due to both epistemic and aleatoric uncertainty under a single framework. In c), the offline RL setting, we propose two algorithms to overcome the uncertainty that stems from only having access to a fixed dataset. The first proposes a scalable algorithm to solve a robust MDP formulation of offline RL, and the second approach is based on risk-sensitive optimisation. In the final contribution chapter, we consider an interactive formulation of learning from demonstration. In this problem, it is necessary to reason about uncertainty over the performance of the current policy, to selectively choose when to request demonstrations. Empirically, we demonstrate that the algorithms we propose can generate risk-sensitive or robust behaviour in a number of different domains.</p>
spellingShingle Artificial intelligence
Machine learning
Robotics
Operations research
Rigter, M
Risk-sensitive and robust model-based reinforcement learning and planning
title Risk-sensitive and robust model-based reinforcement learning and planning
title_full Risk-sensitive and robust model-based reinforcement learning and planning
title_fullStr Risk-sensitive and robust model-based reinforcement learning and planning
title_full_unstemmed Risk-sensitive and robust model-based reinforcement learning and planning
title_short Risk-sensitive and robust model-based reinforcement learning and planning
title_sort risk sensitive and robust model based reinforcement learning and planning
topic Artificial intelligence
Machine learning
Robotics
Operations research
work_keys_str_mv AT rigterm risksensitiveandrobustmodelbasedreinforcementlearningandplanning