Robust reinforcement learning with Bayesian optimisation and quadrature

Bayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables: state features that are...

Full description

Bibliographic Details
Main Authors:	Paul, S, Chatzilygeroudis, K, Ciosek, K, Mouret, J-B, Osborne, MA, Whiteson, S
Format:	Journal article
Language:	English
Published:	Journal of Machine Learning Research 2020

_version_	1797055494861357056
author	Paul, S Chatzilygeroudis, K Ciosek, K Mouret, J-B Osborne, MA Whiteson, S
author_facet	Paul, S Chatzilygeroudis, K Ciosek, K Mouret, J-B Osborne, MA Whiteson, S
author_sort	Paul, S
collection	OXFORD
description	Bayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables: state features that are unobservable and randomly determined by the environment in a physical setting but are controllable in a simulator. This article considers the problem of finding a robust policy while taking into account the impact of environment variables. We present Alternating Optimisation and Quadrature (ALOQ), which uses Bayesian optimisation and Bayesian quadrature to address such settings. We also present Transferable ALOQ (TALOQ), for settings where simulator inaccuracies lead to difficulty in transferring the learnt policy to the physical system. We show that our algorithms are robust to the presence of significant rare events, which may not be observable under random sampling but play a substantial role in determining the optimal policy. Experimental results across different domains show that our algorithms learn robust policies efficiently.
first_indexed	2024-03-06T19:11:28Z
format	Journal article
id	oxford-uuid:16eca5db-614b-4401-acf0-6dc29bf9e49e
institution	University of Oxford
language	English
last_indexed	2024-03-06T19:11:28Z
publishDate	2020
publisher	Journal of Machine Learning Research
record_format	dspace
spelling	oxford-uuid:16eca5db-614b-4401-acf0-6dc29bf9e49e2022-03-26T10:34:08ZRobust reinforcement learning with Bayesian optimisation and quadratureJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:16eca5db-614b-4401-acf0-6dc29bf9e49eEnglishSymplectic ElementsJournal of Machine Learning Research2020Paul, SChatzilygeroudis, KCiosek, KMouret, J-BOsborne, MAWhiteson, SBayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables: state features that are unobservable and randomly determined by the environment in a physical setting but are controllable in a simulator. This article considers the problem of finding a robust policy while taking into account the impact of environment variables. We present Alternating Optimisation and Quadrature (ALOQ), which uses Bayesian optimisation and Bayesian quadrature to address such settings. We also present Transferable ALOQ (TALOQ), for settings where simulator inaccuracies lead to difficulty in transferring the learnt policy to the physical system. We show that our algorithms are robust to the presence of significant rare events, which may not be observable under random sampling but play a substantial role in determining the optimal policy. Experimental results across different domains show that our algorithms learn robust policies efficiently.
spellingShingle	Paul, S Chatzilygeroudis, K Ciosek, K Mouret, J-B Osborne, MA Whiteson, S Robust reinforcement learning with Bayesian optimisation and quadrature
title	Robust reinforcement learning with Bayesian optimisation and quadrature
title_full	Robust reinforcement learning with Bayesian optimisation and quadrature
title_fullStr	Robust reinforcement learning with Bayesian optimisation and quadrature
title_full_unstemmed	Robust reinforcement learning with Bayesian optimisation and quadrature
title_short	Robust reinforcement learning with Bayesian optimisation and quadrature
title_sort	robust reinforcement learning with bayesian optimisation and quadrature
work_keys_str_mv	AT pauls robustreinforcementlearningwithbayesianoptimisationandquadrature AT chatzilygeroudisk robustreinforcementlearningwithbayesianoptimisationandquadrature AT ciosekk robustreinforcementlearningwithbayesianoptimisationandquadrature AT mouretjb robustreinforcementlearningwithbayesianoptimisationandquadrature AT osbornema robustreinforcementlearningwithbayesianoptimisationandquadrature AT whitesons robustreinforcementlearningwithbayesianoptimisationandquadrature

Robust reinforcement learning with Bayesian optimisation and quadrature

Similar Items