Rank2Reward: Learning Robot Reward Functions from Passive Video
Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation is a promising approach, but puts a heavy burden of data collection on human supervisors as well as instrumentation for inferring states and actions. In contra...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2023
|
Online Access: | https://hdl.handle.net/1721.1/151463 |
_version_ | 1826210096190849024 |
---|---|
author | Yang, Daniel Xin |
author2 | Agrawal, Pulkit |
author_facet | Agrawal, Pulkit Yang, Daniel Xin |
author_sort | Yang, Daniel Xin |
collection | MIT |
description | Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation is a promising approach, but puts a heavy burden of data collection on human supervisors as well as instrumentation for inferring states and actions. In contrast to this paradigm, it is often significantly easy to be provided visual data of tasks being performed. Ideally, this data can serve to guide robot learning for new tasks in novel environments, informing both what to do and how to do it. A powerful way to encoder both what to do and how to do it in the absence of low-level states and actions is by inferring a well-shaped reward function for reinforcement learning. The challenging problem is determining how to ground visual demonstration inputs into a well-shaped and informative reward function for reinforcement learning. To this end, we propose a technique, Rank2Reward, for learning behaviors from videos of tasks being performed, without access to any low-level states and actions. We do so by leveraging the videos to learn a reward function that measures incremental “progress" through a task by learning how to rank the video frames in a demonstration in order. By inferring an appropriate ranking, the reward function is able to quickly indicate when task progress is being made, guiding reinforcement learning to quickly learn the task in new scenarios. We demonstrate the effectiveness of this simple technique at learning behaviors directly from raw video on a number of tasks in simulation as well as several tasks on a real world robotic arm. |
first_indexed | 2024-09-23T14:42:50Z |
format | Thesis |
id | mit-1721.1/151463 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T14:42:50Z |
publishDate | 2023 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1514632023-08-01T03:20:59Z Rank2Reward: Learning Robot Reward Functions from Passive Video Yang, Daniel Xin Agrawal, Pulkit Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation is a promising approach, but puts a heavy burden of data collection on human supervisors as well as instrumentation for inferring states and actions. In contrast to this paradigm, it is often significantly easy to be provided visual data of tasks being performed. Ideally, this data can serve to guide robot learning for new tasks in novel environments, informing both what to do and how to do it. A powerful way to encoder both what to do and how to do it in the absence of low-level states and actions is by inferring a well-shaped reward function for reinforcement learning. The challenging problem is determining how to ground visual demonstration inputs into a well-shaped and informative reward function for reinforcement learning. To this end, we propose a technique, Rank2Reward, for learning behaviors from videos of tasks being performed, without access to any low-level states and actions. We do so by leveraging the videos to learn a reward function that measures incremental “progress" through a task by learning how to rank the video frames in a demonstration in order. By inferring an appropriate ranking, the reward function is able to quickly indicate when task progress is being made, guiding reinforcement learning to quickly learn the task in new scenarios. We demonstrate the effectiveness of this simple technique at learning behaviors directly from raw video on a number of tasks in simulation as well as several tasks on a real world robotic arm. S.M. 2023-07-31T19:41:36Z 2023-07-31T19:41:36Z 2023-06 2023-07-13T14:30:21.300Z Thesis https://hdl.handle.net/1721.1/151463 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Yang, Daniel Xin Rank2Reward: Learning Robot Reward Functions from Passive Video |
title | Rank2Reward: Learning Robot Reward Functions
from Passive Video |
title_full | Rank2Reward: Learning Robot Reward Functions
from Passive Video |
title_fullStr | Rank2Reward: Learning Robot Reward Functions
from Passive Video |
title_full_unstemmed | Rank2Reward: Learning Robot Reward Functions
from Passive Video |
title_short | Rank2Reward: Learning Robot Reward Functions
from Passive Video |
title_sort | rank2reward learning robot reward functions from passive video |
url | https://hdl.handle.net/1721.1/151463 |
work_keys_str_mv | AT yangdanielxin rank2rewardlearningrobotrewardfunctionsfrompassivevideo |