Rank2Reward: Learning Robot Reward Functions from Passive Video

Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation is a promising approach, but puts a heavy burden of data collection on human supervisors as well as instrumentation for inferring states and actions. In contra...

Full description

Bibliographic Details
Main Author: Yang, Daniel Xin
Other Authors: Agrawal, Pulkit
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/151463
_version_ 1826210096190849024
author Yang, Daniel Xin
author2 Agrawal, Pulkit
author_facet Agrawal, Pulkit
Yang, Daniel Xin
author_sort Yang, Daniel Xin
collection MIT
description Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation is a promising approach, but puts a heavy burden of data collection on human supervisors as well as instrumentation for inferring states and actions. In contrast to this paradigm, it is often significantly easy to be provided visual data of tasks being performed. Ideally, this data can serve to guide robot learning for new tasks in novel environments, informing both what to do and how to do it. A powerful way to encoder both what to do and how to do it in the absence of low-level states and actions is by inferring a well-shaped reward function for reinforcement learning. The challenging problem is determining how to ground visual demonstration inputs into a well-shaped and informative reward function for reinforcement learning. To this end, we propose a technique, Rank2Reward, for learning behaviors from videos of tasks being performed, without access to any low-level states and actions. We do so by leveraging the videos to learn a reward function that measures incremental “progress" through a task by learning how to rank the video frames in a demonstration in order. By inferring an appropriate ranking, the reward function is able to quickly indicate when task progress is being made, guiding reinforcement learning to quickly learn the task in new scenarios. We demonstrate the effectiveness of this simple technique at learning behaviors directly from raw video on a number of tasks in simulation as well as several tasks on a real world robotic arm.
first_indexed 2024-09-23T14:42:50Z
format Thesis
id mit-1721.1/151463
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T14:42:50Z
publishDate 2023
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1514632023-08-01T03:20:59Z Rank2Reward: Learning Robot Reward Functions from Passive Video Yang, Daniel Xin Agrawal, Pulkit Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation is a promising approach, but puts a heavy burden of data collection on human supervisors as well as instrumentation for inferring states and actions. In contrast to this paradigm, it is often significantly easy to be provided visual data of tasks being performed. Ideally, this data can serve to guide robot learning for new tasks in novel environments, informing both what to do and how to do it. A powerful way to encoder both what to do and how to do it in the absence of low-level states and actions is by inferring a well-shaped reward function for reinforcement learning. The challenging problem is determining how to ground visual demonstration inputs into a well-shaped and informative reward function for reinforcement learning. To this end, we propose a technique, Rank2Reward, for learning behaviors from videos of tasks being performed, without access to any low-level states and actions. We do so by leveraging the videos to learn a reward function that measures incremental “progress" through a task by learning how to rank the video frames in a demonstration in order. By inferring an appropriate ranking, the reward function is able to quickly indicate when task progress is being made, guiding reinforcement learning to quickly learn the task in new scenarios. We demonstrate the effectiveness of this simple technique at learning behaviors directly from raw video on a number of tasks in simulation as well as several tasks on a real world robotic arm. S.M. 2023-07-31T19:41:36Z 2023-07-31T19:41:36Z 2023-06 2023-07-13T14:30:21.300Z Thesis https://hdl.handle.net/1721.1/151463 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Yang, Daniel Xin
Rank2Reward: Learning Robot Reward Functions from Passive Video
title Rank2Reward: Learning Robot Reward Functions from Passive Video
title_full Rank2Reward: Learning Robot Reward Functions from Passive Video
title_fullStr Rank2Reward: Learning Robot Reward Functions from Passive Video
title_full_unstemmed Rank2Reward: Learning Robot Reward Functions from Passive Video
title_short Rank2Reward: Learning Robot Reward Functions from Passive Video
title_sort rank2reward learning robot reward functions from passive video
url https://hdl.handle.net/1721.1/151463
work_keys_str_mv AT yangdanielxin rank2rewardlearningrobotrewardfunctionsfrompassivevideo