Rank2Reward: Learning Robot Reward Functions from Passive Video

Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation is a promising approach, but puts a heavy burden of data collection on human supervisors as well as instrumentation for inferring states and actions. In contra...

Full description

Bibliographic Details
Main Author:	Yang, Daniel Xin
Other Authors:	Agrawal, Pulkit
Format:	Thesis
Published:	Massachusetts Institute of Technology 2023
Online Access:	https://hdl.handle.net/1721.1/151463

_version_	1826210096190849024
author	Yang, Daniel Xin
author2	Agrawal, Pulkit
author_facet	Agrawal, Pulkit Yang, Daniel Xin
author_sort	Yang, Daniel Xin
collection	MIT
description	Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation is a promising approach, but puts a heavy burden of data collection on human supervisors as well as instrumentation for inferring states and actions. In contrast to this paradigm, it is often significantly easy to be provided visual data of tasks being performed. Ideally, this data can serve to guide robot learning for new tasks in novel environments, informing both what to do and how to do it. A powerful way to encoder both what to do and how to do it in the absence of low-level states and actions is by inferring a well-shaped reward function for reinforcement learning. The challenging problem is determining how to ground visual demonstration inputs into a well-shaped and informative reward function for reinforcement learning. To this end, we propose a technique, Rank2Reward, for learning behaviors from videos of tasks being performed, without access to any low-level states and actions. We do so by leveraging the videos to learn a reward function that measures incremental “progress" through a task by learning how to rank the video frames in a demonstration in order. By inferring an appropriate ranking, the reward function is able to quickly indicate when task progress is being made, guiding reinforcement learning to quickly learn the task in new scenarios. We demonstrate the effectiveness of this simple technique at learning behaviors directly from raw video on a number of tasks in simulation as well as several tasks on a real world robotic arm.
first_indexed	2024-09-23T14:42:50Z
format	Thesis
id	mit-1721.1/151463
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T14:42:50Z
publishDate	2023
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1514632023-08-01T03:20:59Z Rank2Reward: Learning Robot Reward Functions from Passive Video Yang, Daniel Xin Agrawal, Pulkit Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation is a promising approach, but puts a heavy burden of data collection on human supervisors as well as instrumentation for inferring states and actions. In contrast to this paradigm, it is often significantly easy to be provided visual data of tasks being performed. Ideally, this data can serve to guide robot learning for new tasks in novel environments, informing both what to do and how to do it. A powerful way to encoder both what to do and how to do it in the absence of low-level states and actions is by inferring a well-shaped reward function for reinforcement learning. The challenging problem is determining how to ground visual demonstration inputs into a well-shaped and informative reward function for reinforcement learning. To this end, we propose a technique, Rank2Reward, for learning behaviors from videos of tasks being performed, without access to any low-level states and actions. We do so by leveraging the videos to learn a reward function that measures incremental “progress" through a task by learning how to rank the video frames in a demonstration in order. By inferring an appropriate ranking, the reward function is able to quickly indicate when task progress is being made, guiding reinforcement learning to quickly learn the task in new scenarios. We demonstrate the effectiveness of this simple technique at learning behaviors directly from raw video on a number of tasks in simulation as well as several tasks on a real world robotic arm. S.M. 2023-07-31T19:41:36Z 2023-07-31T19:41:36Z 2023-06 2023-07-13T14:30:21.300Z Thesis https://hdl.handle.net/1721.1/151463 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Yang, Daniel Xin Rank2Reward: Learning Robot Reward Functions from Passive Video
title	Rank2Reward: Learning Robot Reward Functions from Passive Video
title_full	Rank2Reward: Learning Robot Reward Functions from Passive Video
title_fullStr	Rank2Reward: Learning Robot Reward Functions from Passive Video
title_full_unstemmed	Rank2Reward: Learning Robot Reward Functions from Passive Video
title_short	Rank2Reward: Learning Robot Reward Functions from Passive Video
title_sort	rank2reward learning robot reward functions from passive video
url	https://hdl.handle.net/1721.1/151463
work_keys_str_mv	AT yangdanielxin rank2rewardlearningrobotrewardfunctionsfrompassivevideo

Rank2Reward: Learning Robot Reward Functions from Passive Video

Similar Items