Deep compositional robotic planners that follow natural language commands

We demonstrate how a sampling-based robotic planner can be augmented to learn to understand a sequence of natural language commands in a continuous configuration space to move and manipu- late objects. Our approach combines a deep network structured according to the parse of a complex command that i...

Full description

Bibliographic Details
Main Authors:	Kuo, Yen-Ling, Katz, Boris, Barbu, Andrei
Format:	Article
Published:	Center for Brains, Minds and Machines (CBMM), Computation and Systems Neuroscience (Cosyne) 2022
Online Access:	https://hdl.handle.net/1721.1/141354

_version_	1826196744969388032
author	Kuo, Yen-Ling Katz, Boris Barbu, Andrei
author_facet	Kuo, Yen-Ling Katz, Boris Barbu, Andrei
author_sort	Kuo, Yen-Ling
collection	MIT
description	We demonstrate how a sampling-based robotic planner can be augmented to learn to understand a sequence of natural language commands in a continuous configuration space to move and manipu- late objects. Our approach combines a deep network structured according to the parse of a complex command that includes objects, verbs, spatial relations, and attributes, with a sampling-based planner, RRT. A recurrent hierarchical deep network controls how the planner explores the environment, de- termines when a planned path is likely to achieve a goal, and estimates the confidence of each move to trade off exploitation and exploration between the network and the planner. Planners are designed to have near-optimal behavior when information about the task is missing, while networks learn to ex- ploit observations which are available from the environment, making the two naturally complementary. Combining the two enables generalization to new maps, new kinds of obstacles, and more complex sentences that do not occur in the training set. Little data is required to train the model despite it jointly acquiring a CNN that extracts features from the environment as it learns the meanings of words. The model provides a level of interpretability through the use of attention maps allowing users to see its reasoning steps despite being an end-to-end model. This end-to-end model allows robots to learn to follow natural language commands in challenging continuous environments.
first_indexed	2024-09-23T10:37:23Z
format	Article
id	mit-1721.1/141354
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T10:37:23Z
publishDate	2022
publisher	Center for Brains, Minds and Machines (CBMM), Computation and Systems Neuroscience (Cosyne)
record_format	dspace
spelling	mit-1721.1/1413542022-03-25T03:28:55Z Deep compositional robotic planners that follow natural language commands Kuo, Yen-Ling Katz, Boris Barbu, Andrei We demonstrate how a sampling-based robotic planner can be augmented to learn to understand a sequence of natural language commands in a continuous configuration space to move and manipu- late objects. Our approach combines a deep network structured according to the parse of a complex command that includes objects, verbs, spatial relations, and attributes, with a sampling-based planner, RRT. A recurrent hierarchical deep network controls how the planner explores the environment, de- termines when a planned path is likely to achieve a goal, and estimates the confidence of each move to trade off exploitation and exploration between the network and the planner. Planners are designed to have near-optimal behavior when information about the task is missing, while networks learn to ex- ploit observations which are available from the environment, making the two naturally complementary. Combining the two enables generalization to new maps, new kinds of obstacles, and more complex sentences that do not occur in the training set. Little data is required to train the model despite it jointly acquiring a CNN that extracts features from the environment as it learns the meanings of words. The model provides a level of interpretability through the use of attention maps allowing users to see its reasoning steps despite being an end-to-end model. This end-to-end model allows robots to learn to follow natural language commands in challenging continuous environments. This material is based upon work supported by the Center for Brains,Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. 2022-03-24T16:53:23Z 2022-03-24T16:53:23Z 2020-05-31 Article Technical Report Working Paper https://hdl.handle.net/1721.1/141354 CBMM Memo;124 application/pdf Center for Brains, Minds and Machines (CBMM), Computation and Systems Neuroscience (Cosyne)
spellingShingle	Kuo, Yen-Ling Katz, Boris Barbu, Andrei Deep compositional robotic planners that follow natural language commands
title	Deep compositional robotic planners that follow natural language commands
title_full	Deep compositional robotic planners that follow natural language commands
title_fullStr	Deep compositional robotic planners that follow natural language commands
title_full_unstemmed	Deep compositional robotic planners that follow natural language commands
title_short	Deep compositional robotic planners that follow natural language commands
title_sort	deep compositional robotic planners that follow natural language commands
url	https://hdl.handle.net/1721.1/141354
work_keys_str_mv	AT kuoyenling deepcompositionalroboticplannersthatfollownaturallanguagecommands AT katzboris deepcompositionalroboticplannersthatfollownaturallanguagecommands AT barbuandrei deepcompositionalroboticplannersthatfollownaturallanguagecommands

Deep compositional robotic planners that follow natural language commands

Similar Items