Inverse reinforcement learning from failure

<em>Inverse reinforcement learning</em> (IRL) allows autonomous agents to learn to solve complex tasks from successful demonstrations. However, in many settings, e.g., when a human learns the task by trial and error, <em>failed</em> demonstrations are also readily available....

Full description

Bibliographic Details
Main Authors: Shiarlis, K, Messias, J, Whiteson, S
Format: Conference item
Published: International Foundation for Autonomous Agents and Multiagent Systems 2016
_version_ 1826282754550005760
author Shiarlis, K
Messias, J
Whiteson, S
author_facet Shiarlis, K
Messias, J
Whiteson, S
author_sort Shiarlis, K
collection OXFORD
description <em>Inverse reinforcement learning</em> (IRL) allows autonomous agents to learn to solve complex tasks from successful demonstrations. However, in many settings, e.g., when a human learns the task by trial and error, <em>failed</em> demonstrations are also readily available. In addition, in some tasks, purposely generating failed demonstrations may be easier than generating successful ones. Since existing IRL methods cannot make use of failed demonstrations, in this paper we propose <em>inverse reinforcement learning from failure</em> (IRLF) which exploits both successful and failed demonstrations. Starting from the state-of-the-art <em>maximum causal entropy IRL</em> method, we propose a new constrained optimisation formulation that accommodates both types of demonstrations while remaining convex. We then derive update rules for learning reward functions and policies. Experiments on both simulated and real-robot data demonstrate that IRLF converges faster and generalises better than maximum causal entropy IRL, especially when few successful demonstrations are available.
first_indexed 2024-03-07T00:48:37Z
format Conference item
id oxford-uuid:8593deb6-f16c-4545-a732-472625eaffb3
institution University of Oxford
last_indexed 2024-03-07T00:48:37Z
publishDate 2016
publisher International Foundation for Autonomous Agents and Multiagent Systems
record_format dspace
spelling oxford-uuid:8593deb6-f16c-4545-a732-472625eaffb32022-03-26T21:58:27ZInverse reinforcement learning from failureConference itemhttp://purl.org/coar/resource_type/c_5794uuid:8593deb6-f16c-4545-a732-472625eaffb3Symplectic Elements at OxfordInternational Foundation for Autonomous Agents and Multiagent Systems2016Shiarlis, KMessias, JWhiteson, S<em>Inverse reinforcement learning</em> (IRL) allows autonomous agents to learn to solve complex tasks from successful demonstrations. However, in many settings, e.g., when a human learns the task by trial and error, <em>failed</em> demonstrations are also readily available. In addition, in some tasks, purposely generating failed demonstrations may be easier than generating successful ones. Since existing IRL methods cannot make use of failed demonstrations, in this paper we propose <em>inverse reinforcement learning from failure</em> (IRLF) which exploits both successful and failed demonstrations. Starting from the state-of-the-art <em>maximum causal entropy IRL</em> method, we propose a new constrained optimisation formulation that accommodates both types of demonstrations while remaining convex. We then derive update rules for learning reward functions and policies. Experiments on both simulated and real-robot data demonstrate that IRLF converges faster and generalises better than maximum causal entropy IRL, especially when few successful demonstrations are available.
spellingShingle Shiarlis, K
Messias, J
Whiteson, S
Inverse reinforcement learning from failure
title Inverse reinforcement learning from failure
title_full Inverse reinforcement learning from failure
title_fullStr Inverse reinforcement learning from failure
title_full_unstemmed Inverse reinforcement learning from failure
title_short Inverse reinforcement learning from failure
title_sort inverse reinforcement learning from failure
work_keys_str_mv AT shiarlisk inversereinforcementlearningfromfailure
AT messiasj inversereinforcementlearningfromfailure
AT whitesons inversereinforcementlearningfromfailure