OFFER: Off-environment reinforcement learning

Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables - s...

Full description

Bibliographic Details
Main Authors: Ciosek, K, Whiteson, S
Format: Conference item
Language:English
Published: AAAI Press 2017
_version_ 1826312439476518912
author Ciosek, K
Whiteson, S
author_facet Ciosek, K
Whiteson, S
author_sort Ciosek, K
collection OXFORD
description Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables - state features that are randomly determined by the environment in a physical setting but controllable in a simulator. Exploiting environment variables is crucial in domains containing significant rare events (SREs), e.g., unusual wind conditions that can crash a helicopter, which are rarely observed under random sampling but have a considerable impact on expected return. We propose off environment reinforcement learning (OFFER), which addresses such cases by simultaneously optimising the policy and a proposal distribution over environment variables. We prove that OFFER converges to a locally optimal policy and show experimentally that it learns better and faster than a policy gradient baseline.
first_indexed 2024-03-07T08:29:00Z
format Conference item
id oxford-uuid:4c5b4f56-bc4d-4617-931c-7487e4c7bd94
institution University of Oxford
language English
last_indexed 2024-03-07T08:29:00Z
publishDate 2017
publisher AAAI Press
record_format dspace
spelling oxford-uuid:4c5b4f56-bc4d-4617-931c-7487e4c7bd942024-03-01T09:24:27ZOFFER: Off-environment reinforcement learningConference itemhttp://purl.org/coar/resource_type/c_5794uuid:4c5b4f56-bc4d-4617-931c-7487e4c7bd94EnglishSymplectic Elements at OxfordAAAI Press2017Ciosek, KWhiteson, SPolicy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables - state features that are randomly determined by the environment in a physical setting but controllable in a simulator. Exploiting environment variables is crucial in domains containing significant rare events (SREs), e.g., unusual wind conditions that can crash a helicopter, which are rarely observed under random sampling but have a considerable impact on expected return. We propose off environment reinforcement learning (OFFER), which addresses such cases by simultaneously optimising the policy and a proposal distribution over environment variables. We prove that OFFER converges to a locally optimal policy and show experimentally that it learns better and faster than a policy gradient baseline.
spellingShingle Ciosek, K
Whiteson, S
OFFER: Off-environment reinforcement learning
title OFFER: Off-environment reinforcement learning
title_full OFFER: Off-environment reinforcement learning
title_fullStr OFFER: Off-environment reinforcement learning
title_full_unstemmed OFFER: Off-environment reinforcement learning
title_short OFFER: Off-environment reinforcement learning
title_sort offer off environment reinforcement learning
work_keys_str_mv AT ciosekk offeroffenvironmentreinforcementlearning
AT whitesons offeroffenvironmentreinforcementlearning