Inverse reinforcement learning from failure
<em>Inverse reinforcement learning</em> (IRL) allows autonomous agents to learn to solve complex tasks from successful demonstrations. However, in many settings, e.g., when a human learns the task by trial and error, <em>failed</em> demonstrations are also readily available....
Main Authors: | , , |
---|---|
Format: | Conference item |
Published: |
International Foundation for Autonomous Agents and Multiagent Systems
2016
|
_version_ | 1826282754550005760 |
---|---|
author | Shiarlis, K Messias, J Whiteson, S |
author_facet | Shiarlis, K Messias, J Whiteson, S |
author_sort | Shiarlis, K |
collection | OXFORD |
description | <em>Inverse reinforcement learning</em> (IRL) allows autonomous agents to learn to solve complex tasks from successful demonstrations. However, in many settings, e.g., when a human learns the task by trial and error, <em>failed</em> demonstrations are also readily available. In addition, in some tasks, purposely generating failed demonstrations may be easier than generating successful ones. Since existing IRL methods cannot make use of failed demonstrations, in this paper we propose <em>inverse reinforcement learning from failure</em> (IRLF) which exploits both successful and failed demonstrations. Starting from the state-of-the-art <em>maximum causal entropy IRL</em> method, we propose a new constrained optimisation formulation that accommodates both types of demonstrations while remaining convex. We then derive update rules for learning reward functions and policies. Experiments on both simulated and real-robot data demonstrate that IRLF converges faster and generalises better than maximum causal entropy IRL, especially when few successful demonstrations are available. |
first_indexed | 2024-03-07T00:48:37Z |
format | Conference item |
id | oxford-uuid:8593deb6-f16c-4545-a732-472625eaffb3 |
institution | University of Oxford |
last_indexed | 2024-03-07T00:48:37Z |
publishDate | 2016 |
publisher | International Foundation for Autonomous Agents and Multiagent Systems |
record_format | dspace |
spelling | oxford-uuid:8593deb6-f16c-4545-a732-472625eaffb32022-03-26T21:58:27ZInverse reinforcement learning from failureConference itemhttp://purl.org/coar/resource_type/c_5794uuid:8593deb6-f16c-4545-a732-472625eaffb3Symplectic Elements at OxfordInternational Foundation for Autonomous Agents and Multiagent Systems2016Shiarlis, KMessias, JWhiteson, S<em>Inverse reinforcement learning</em> (IRL) allows autonomous agents to learn to solve complex tasks from successful demonstrations. However, in many settings, e.g., when a human learns the task by trial and error, <em>failed</em> demonstrations are also readily available. In addition, in some tasks, purposely generating failed demonstrations may be easier than generating successful ones. Since existing IRL methods cannot make use of failed demonstrations, in this paper we propose <em>inverse reinforcement learning from failure</em> (IRLF) which exploits both successful and failed demonstrations. Starting from the state-of-the-art <em>maximum causal entropy IRL</em> method, we propose a new constrained optimisation formulation that accommodates both types of demonstrations while remaining convex. We then derive update rules for learning reward functions and policies. Experiments on both simulated and real-robot data demonstrate that IRLF converges faster and generalises better than maximum causal entropy IRL, especially when few successful demonstrations are available. |
spellingShingle | Shiarlis, K Messias, J Whiteson, S Inverse reinforcement learning from failure |
title | Inverse reinforcement learning from failure |
title_full | Inverse reinforcement learning from failure |
title_fullStr | Inverse reinforcement learning from failure |
title_full_unstemmed | Inverse reinforcement learning from failure |
title_short | Inverse reinforcement learning from failure |
title_sort | inverse reinforcement learning from failure |
work_keys_str_mv | AT shiarlisk inversereinforcementlearningfromfailure AT messiasj inversereinforcementlearningfromfailure AT whitesons inversereinforcementlearningfromfailure |