Imitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception

As robots are increasingly being utilized to perform automated tasks, effective methods for transferring task specifications to robots have become imperative. However, existing techniques for training robots to perform tasks often depend on rote mimicry of human demonstrations and do not generalize...

Full description

Bibliographic Details
Main Author:	Kim, Dain
Other Authors:	Shah, Julie A.
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/139416

_version_	1811074639469740032
author	Kim, Dain
author2	Shah, Julie A.
author_facet	Shah, Julie A. Kim, Dain
author_sort	Kim, Dain
collection	MIT
description	As robots are increasingly being utilized to perform automated tasks, effective methods for transferring task specifications to robots have become imperative. However, existing techniques for training robots to perform tasks often depend on rote mimicry of human demonstrations and do not generalize well to new tasks or contexts. In addition, learning an end-to-end policy for performing a sequence of operations for a high-level goal remains a challenge. Transferring sequential task specifications is a difficult objective, as it requires extensive human intervention to establish the structure of the task including the constraints, objects of interest, and control parameters. In this thesis, we present an imitation learning framework for sequential manipulation tasks that enables humans to easily communicate abstract high-level task goals to the robot without explicit programming or robotics expertise. We introduce natural language input to the system to facilitate the learning of task specifications. During training, a human teacher provides demonstrations and a verbal description of the task being performed. The training process then learns a mapping from the multi-modal inputs to the low-level control policies. During execution, the high-level task instruction input is parsed into a list of sub-tasks that the robot has learned to perform. The presented framework is evaluated in a simulated table-top scenario of a robotic arm performing sorting and kitting tasks from natural language commands. The approach developed in this thesis achieved an overall task completion rate of 91.16% on 600 novel task scenes, with a sub-task execution success rate of 96.44% on 1,712 individual “pick” and “place” tasks.
first_indexed	2024-09-23T09:53:00Z
format	Thesis
id	mit-1721.1/139416
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T09:53:00Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1394162022-01-15T03:57:19Z Imitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception Kim, Dain Shah, Julie A. Figueroa, Nadia Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science As robots are increasingly being utilized to perform automated tasks, effective methods for transferring task specifications to robots have become imperative. However, existing techniques for training robots to perform tasks often depend on rote mimicry of human demonstrations and do not generalize well to new tasks or contexts. In addition, learning an end-to-end policy for performing a sequence of operations for a high-level goal remains a challenge. Transferring sequential task specifications is a difficult objective, as it requires extensive human intervention to establish the structure of the task including the constraints, objects of interest, and control parameters. In this thesis, we present an imitation learning framework for sequential manipulation tasks that enables humans to easily communicate abstract high-level task goals to the robot without explicit programming or robotics expertise. We introduce natural language input to the system to facilitate the learning of task specifications. During training, a human teacher provides demonstrations and a verbal description of the task being performed. The training process then learns a mapping from the multi-modal inputs to the low-level control policies. During execution, the high-level task instruction input is parsed into a list of sub-tasks that the robot has learned to perform. The presented framework is evaluated in a simulated table-top scenario of a robotic arm performing sorting and kitting tasks from natural language commands. The approach developed in this thesis achieved an overall task completion rate of 91.16% on 600 novel task scenes, with a sub-task execution success rate of 96.44% on 1,712 individual “pick” and “place” tasks. M.Eng. 2022-01-14T15:10:13Z 2022-01-14T15:10:13Z 2021-06 2021-06-17T20:13:32.193Z Thesis https://hdl.handle.net/1721.1/139416 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Kim, Dain Imitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception
title	Imitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception
title_full	Imitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception
title_fullStr	Imitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception
title_full_unstemmed	Imitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception
title_short	Imitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception
title_sort	imitation learning for sequential manipulation tasks leveraging language and perception
url	https://hdl.handle.net/1721.1/139416
work_keys_str_mv	AT kimdain imitationlearningforsequentialmanipulationtasksleveraginglanguageandperception

Imitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception

Similar Items