Neural Feature Fields for Language-Guided Robot Manipulation
Self-supervised and language-supervised image models contain rich knowledge of the world that is important for generalization. Many robotic tasks, however, require a detailed understanding of 3D geometry, which is often lacking in 2D image features. This work bridges this 2D-to-3D gap for robotic ma...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2023
|
Online Access: | https://hdl.handle.net/1721.1/152770 https://orcid.org/0009-0004-0227-1071 |
_version_ | 1826193151508873216 |
---|---|
author | Shen, William |
author2 | Kaelbling, Leslie P. |
author_facet | Kaelbling, Leslie P. Shen, William |
author_sort | Shen, William |
collection | MIT |
description | Self-supervised and language-supervised image models contain rich knowledge of the world that is important for generalization. Many robotic tasks, however, require a detailed understanding of 3D geometry, which is often lacking in 2D image features. This work bridges this 2D-to-3D gap for robotic manipulation by leveraging distilled feature fields to combine accurate 3D geometry with rich semantics from 2D foundation models. We present a few-shot learning method for 6-DOF grasping and placing that harnesses these strong spatial and semantic priors to achieve in-the-wild generalization to unseen objects. Using features distilled from a vision-language model, CLIP, we present a way to designate novel objects for manipulation via free-text natural language, and demonstrate its ability to generalize to unseen expressions and novel categories of objects. |
first_indexed | 2024-09-23T09:34:31Z |
format | Thesis |
id | mit-1721.1/152770 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T09:34:31Z |
publishDate | 2023 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1527702023-11-03T03:56:49Z Neural Feature Fields for Language-Guided Robot Manipulation Shen, William Kaelbling, Leslie P. Lozano-Pérez, Tomás Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Self-supervised and language-supervised image models contain rich knowledge of the world that is important for generalization. Many robotic tasks, however, require a detailed understanding of 3D geometry, which is often lacking in 2D image features. This work bridges this 2D-to-3D gap for robotic manipulation by leveraging distilled feature fields to combine accurate 3D geometry with rich semantics from 2D foundation models. We present a few-shot learning method for 6-DOF grasping and placing that harnesses these strong spatial and semantic priors to achieve in-the-wild generalization to unseen objects. Using features distilled from a vision-language model, CLIP, we present a way to designate novel objects for manipulation via free-text natural language, and demonstrate its ability to generalize to unseen expressions and novel categories of objects. S.M. 2023-11-02T20:14:56Z 2023-11-02T20:14:56Z 2023-09 2023-09-21T14:26:20.897Z Thesis https://hdl.handle.net/1721.1/152770 https://orcid.org/0009-0004-0227-1071 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Shen, William Neural Feature Fields for Language-Guided Robot Manipulation |
title | Neural Feature Fields for Language-Guided Robot Manipulation |
title_full | Neural Feature Fields for Language-Guided Robot Manipulation |
title_fullStr | Neural Feature Fields for Language-Guided Robot Manipulation |
title_full_unstemmed | Neural Feature Fields for Language-Guided Robot Manipulation |
title_short | Neural Feature Fields for Language-Guided Robot Manipulation |
title_sort | neural feature fields for language guided robot manipulation |
url | https://hdl.handle.net/1721.1/152770 https://orcid.org/0009-0004-0227-1071 |
work_keys_str_mv | AT shenwilliam neuralfeaturefieldsforlanguageguidedrobotmanipulation |