Imitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception

As robots are increasingly being utilized to perform automated tasks, effective methods for transferring task specifications to robots have become imperative. However, existing techniques for training robots to perform tasks often depend on rote mimicry of human demonstrations and do not generalize...

Full description

Bibliographic Details
Main Author: Kim, Dain
Other Authors: Shah, Julie A.
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/139416
_version_ 1811074639469740032
author Kim, Dain
author2 Shah, Julie A.
author_facet Shah, Julie A.
Kim, Dain
author_sort Kim, Dain
collection MIT
description As robots are increasingly being utilized to perform automated tasks, effective methods for transferring task specifications to robots have become imperative. However, existing techniques for training robots to perform tasks often depend on rote mimicry of human demonstrations and do not generalize well to new tasks or contexts. In addition, learning an end-to-end policy for performing a sequence of operations for a high-level goal remains a challenge. Transferring sequential task specifications is a difficult objective, as it requires extensive human intervention to establish the structure of the task including the constraints, objects of interest, and control parameters. In this thesis, we present an imitation learning framework for sequential manipulation tasks that enables humans to easily communicate abstract high-level task goals to the robot without explicit programming or robotics expertise. We introduce natural language input to the system to facilitate the learning of task specifications. During training, a human teacher provides demonstrations and a verbal description of the task being performed. The training process then learns a mapping from the multi-modal inputs to the low-level control policies. During execution, the high-level task instruction input is parsed into a list of sub-tasks that the robot has learned to perform. The presented framework is evaluated in a simulated table-top scenario of a robotic arm performing sorting and kitting tasks from natural language commands. The approach developed in this thesis achieved an overall task completion rate of 91.16% on 600 novel task scenes, with a sub-task execution success rate of 96.44% on 1,712 individual “pick” and “place” tasks.
first_indexed 2024-09-23T09:53:00Z
format Thesis
id mit-1721.1/139416
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T09:53:00Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1394162022-01-15T03:57:19Z Imitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception Kim, Dain Shah, Julie A. Figueroa, Nadia Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science As robots are increasingly being utilized to perform automated tasks, effective methods for transferring task specifications to robots have become imperative. However, existing techniques for training robots to perform tasks often depend on rote mimicry of human demonstrations and do not generalize well to new tasks or contexts. In addition, learning an end-to-end policy for performing a sequence of operations for a high-level goal remains a challenge. Transferring sequential task specifications is a difficult objective, as it requires extensive human intervention to establish the structure of the task including the constraints, objects of interest, and control parameters. In this thesis, we present an imitation learning framework for sequential manipulation tasks that enables humans to easily communicate abstract high-level task goals to the robot without explicit programming or robotics expertise. We introduce natural language input to the system to facilitate the learning of task specifications. During training, a human teacher provides demonstrations and a verbal description of the task being performed. The training process then learns a mapping from the multi-modal inputs to the low-level control policies. During execution, the high-level task instruction input is parsed into a list of sub-tasks that the robot has learned to perform. The presented framework is evaluated in a simulated table-top scenario of a robotic arm performing sorting and kitting tasks from natural language commands. The approach developed in this thesis achieved an overall task completion rate of 91.16% on 600 novel task scenes, with a sub-task execution success rate of 96.44% on 1,712 individual “pick” and “place” tasks. M.Eng. 2022-01-14T15:10:13Z 2022-01-14T15:10:13Z 2021-06 2021-06-17T20:13:32.193Z Thesis https://hdl.handle.net/1721.1/139416 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Kim, Dain
Imitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception
title Imitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception
title_full Imitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception
title_fullStr Imitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception
title_full_unstemmed Imitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception
title_short Imitation Learning for Sequential Manipulation Tasks: Leveraging Language and Perception
title_sort imitation learning for sequential manipulation tasks leveraging language and perception
url https://hdl.handle.net/1721.1/139416
work_keys_str_mv AT kimdain imitationlearningforsequentialmanipulationtasksleveraginglanguageandperception