OFFER: Off-environment reinforcement learning
Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables - s...
Main Authors: | , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
AAAI Press
2017
|
_version_ | 1826312439476518912 |
---|---|
author | Ciosek, K Whiteson, S |
author_facet | Ciosek, K Whiteson, S |
author_sort | Ciosek, K |
collection | OXFORD |
description | Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables - state features that are randomly determined by the environment in a physical setting but controllable in a simulator. Exploiting environment variables is crucial in domains containing significant rare events (SREs), e.g., unusual wind conditions that can crash a helicopter, which are rarely observed under random sampling but have a considerable impact on expected return. We propose off environment reinforcement learning (OFFER), which addresses such cases by simultaneously optimising the policy and a proposal distribution over environment variables. We prove that OFFER converges to a locally optimal policy and show experimentally that it learns better and faster than a policy gradient baseline. |
first_indexed | 2024-03-07T08:29:00Z |
format | Conference item |
id | oxford-uuid:4c5b4f56-bc4d-4617-931c-7487e4c7bd94 |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T08:29:00Z |
publishDate | 2017 |
publisher | AAAI Press |
record_format | dspace |
spelling | oxford-uuid:4c5b4f56-bc4d-4617-931c-7487e4c7bd942024-03-01T09:24:27ZOFFER: Off-environment reinforcement learningConference itemhttp://purl.org/coar/resource_type/c_5794uuid:4c5b4f56-bc4d-4617-931c-7487e4c7bd94EnglishSymplectic Elements at OxfordAAAI Press2017Ciosek, KWhiteson, SPolicy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables - state features that are randomly determined by the environment in a physical setting but controllable in a simulator. Exploiting environment variables is crucial in domains containing significant rare events (SREs), e.g., unusual wind conditions that can crash a helicopter, which are rarely observed under random sampling but have a considerable impact on expected return. We propose off environment reinforcement learning (OFFER), which addresses such cases by simultaneously optimising the policy and a proposal distribution over environment variables. We prove that OFFER converges to a locally optimal policy and show experimentally that it learns better and faster than a policy gradient baseline. |
spellingShingle | Ciosek, K Whiteson, S OFFER: Off-environment reinforcement learning |
title | OFFER: Off-environment reinforcement learning |
title_full | OFFER: Off-environment reinforcement learning |
title_fullStr | OFFER: Off-environment reinforcement learning |
title_full_unstemmed | OFFER: Off-environment reinforcement learning |
title_short | OFFER: Off-environment reinforcement learning |
title_sort | offer off environment reinforcement learning |
work_keys_str_mv | AT ciosekk offeroffenvironmentreinforcementlearning AT whitesons offeroffenvironmentreinforcementlearning |