OFFER: Off-environment reinforcement learning

Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables - s...

Full description

Bibliographic Details
Main Authors:	Ciosek, K, Whiteson, S
Format:	Conference item
Language:	English
Published:	AAAI Press 2017

_version_	1826312439476518912
author	Ciosek, K Whiteson, S
author_facet	Ciosek, K Whiteson, S
author_sort	Ciosek, K
collection	OXFORD
description	Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables - state features that are randomly determined by the environment in a physical setting but controllable in a simulator. Exploiting environment variables is crucial in domains containing significant rare events (SREs), e.g., unusual wind conditions that can crash a helicopter, which are rarely observed under random sampling but have a considerable impact on expected return. We propose off environment reinforcement learning (OFFER), which addresses such cases by simultaneously optimising the policy and a proposal distribution over environment variables. We prove that OFFER converges to a locally optimal policy and show experimentally that it learns better and faster than a policy gradient baseline.
first_indexed	2024-03-07T08:29:00Z
format	Conference item
id	oxford-uuid:4c5b4f56-bc4d-4617-931c-7487e4c7bd94
institution	University of Oxford
language	English
last_indexed	2024-03-07T08:29:00Z
publishDate	2017
publisher	AAAI Press
record_format	dspace
spelling	oxford-uuid:4c5b4f56-bc4d-4617-931c-7487e4c7bd942024-03-01T09:24:27ZOFFER: Off-environment reinforcement learningConference itemhttp://purl.org/coar/resource_type/c_5794uuid:4c5b4f56-bc4d-4617-931c-7487e4c7bd94EnglishSymplectic Elements at OxfordAAAI Press2017Ciosek, KWhiteson, SPolicy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables - state features that are randomly determined by the environment in a physical setting but controllable in a simulator. Exploiting environment variables is crucial in domains containing significant rare events (SREs), e.g., unusual wind conditions that can crash a helicopter, which are rarely observed under random sampling but have a considerable impact on expected return. We propose off environment reinforcement learning (OFFER), which addresses such cases by simultaneously optimising the policy and a proposal distribution over environment variables. We prove that OFFER converges to a locally optimal policy and show experimentally that it learns better and faster than a policy gradient baseline.
spellingShingle	Ciosek, K Whiteson, S OFFER: Off-environment reinforcement learning
title	OFFER: Off-environment reinforcement learning
title_full	OFFER: Off-environment reinforcement learning
title_fullStr	OFFER: Off-environment reinforcement learning
title_full_unstemmed	OFFER: Off-environment reinforcement learning
title_short	OFFER: Off-environment reinforcement learning
title_sort	offer off environment reinforcement learning
work_keys_str_mv	AT ciosekk offeroffenvironmentreinforcementlearning AT whitesons offeroffenvironmentreinforcementlearning

OFFER: Off-environment reinforcement learning

Similar Items