Rank2Reward: Learning Robot Reward Functions from Passive Video

Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation is a promising approach, but puts a heavy burden of data collection on human supervisors as well as instrumentation for inferring states and actions. In contra...

Full description

Bibliographic Details
Main Author: Yang, Daniel Xin
Other Authors: Agrawal, Pulkit
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/151463
Description
Summary:Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation is a promising approach, but puts a heavy burden of data collection on human supervisors as well as instrumentation for inferring states and actions. In contrast to this paradigm, it is often significantly easy to be provided visual data of tasks being performed. Ideally, this data can serve to guide robot learning for new tasks in novel environments, informing both what to do and how to do it. A powerful way to encoder both what to do and how to do it in the absence of low-level states and actions is by inferring a well-shaped reward function for reinforcement learning. The challenging problem is determining how to ground visual demonstration inputs into a well-shaped and informative reward function for reinforcement learning. To this end, we propose a technique, Rank2Reward, for learning behaviors from videos of tasks being performed, without access to any low-level states and actions. We do so by leveraging the videos to learn a reward function that measures incremental “progress" through a task by learning how to rank the video frames in a demonstration in order. By inferring an appropriate ranking, the reward function is able to quickly indicate when task progress is being made, guiding reinforcement learning to quickly learn the task in new scenarios. We demonstrate the effectiveness of this simple technique at learning behaviors directly from raw video on a number of tasks in simulation as well as several tasks on a real world robotic arm.