3D Spatial Perception with Real-Time Dense Metric-Semantic SLAM
3D Spatial Perception is the ability of an agent to perceive and understand the three-dimensional structure of its environment, including its position and orientation within that environment. This ability is essential for autonomous robots to navigate and interact with their surroundings, since it e...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2023
|
Online Access: | https://hdl.handle.net/1721.1/150288 |
_version_ | 1826202051605954560 |
---|---|
author | Rosinol, Antoni |
author2 | Carlone, Luca |
author_facet | Carlone, Luca Rosinol, Antoni |
author_sort | Rosinol, Antoni |
collection | MIT |
description | 3D Spatial Perception is the ability of an agent to perceive and understand the three-dimensional structure of its environment, including its position and orientation within that environment. This ability is essential for autonomous robots to navigate and interact with their surroundings, since it enables robots to perform a wide variety of tasks, such as obstacle avoidance, path planning, and object manipulation.
To provide robots with a detailed and accurate representation of the surrounding environment, this thesis first proposes the use of a map representation that is geometrically dense, photometrically accurate, and semantically annotated. We define these maps as metric-semantic maps, and provide algorithms to build such maps in real-time. Metric-semantic maps allow both humans and robots to have a shared understanding of the scene, while providing the robot with sufficient information to localize, plan shortest paths, and avoid obstacles along the way. We then present a novel 3D representation that abstracts a dense metric-semantic map into higher-level concepts – such as rooms, corridors, and buildings – and also encodes static objects and dynamic entities. We define such representations as 3D Dynamic Scene Graphs (DSGs), and provide as well algorithms to build 3D DSGs. Finally, we show how these approaches can be combined to form a Spatial Perception Engine capable of building both metric-semantic maps and 3D DSGs from visual and inertial data. We also demonstrate the effectiveness of 3D DSGs for fast semantic path-planning queries, which can be used to direct robots using natural language commands.
In addition to the algorithms presented in this thesis, we open-source our code and datasets for the research community to use and explore. We believe that the algorithms and resources provided in this thesis open up exciting new possibilities in the field of 3D spatial perception, and we hope to inspire further research in this area, with the ultimate goal of creating fully autonomous robots that are able to navigate and operate in complex environments. |
first_indexed | 2024-09-23T12:01:06Z |
format | Thesis |
id | mit-1721.1/150288 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T12:01:06Z |
publishDate | 2023 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1502882023-04-01T03:35:19Z 3D Spatial Perception with Real-Time Dense Metric-Semantic SLAM Rosinol, Antoni Carlone, Luca Leonard, John J. Massachusetts Institute of Technology. Department of Aeronautics and Astronautics 3D Spatial Perception is the ability of an agent to perceive and understand the three-dimensional structure of its environment, including its position and orientation within that environment. This ability is essential for autonomous robots to navigate and interact with their surroundings, since it enables robots to perform a wide variety of tasks, such as obstacle avoidance, path planning, and object manipulation. To provide robots with a detailed and accurate representation of the surrounding environment, this thesis first proposes the use of a map representation that is geometrically dense, photometrically accurate, and semantically annotated. We define these maps as metric-semantic maps, and provide algorithms to build such maps in real-time. Metric-semantic maps allow both humans and robots to have a shared understanding of the scene, while providing the robot with sufficient information to localize, plan shortest paths, and avoid obstacles along the way. We then present a novel 3D representation that abstracts a dense metric-semantic map into higher-level concepts – such as rooms, corridors, and buildings – and also encodes static objects and dynamic entities. We define such representations as 3D Dynamic Scene Graphs (DSGs), and provide as well algorithms to build 3D DSGs. Finally, we show how these approaches can be combined to form a Spatial Perception Engine capable of building both metric-semantic maps and 3D DSGs from visual and inertial data. We also demonstrate the effectiveness of 3D DSGs for fast semantic path-planning queries, which can be used to direct robots using natural language commands. In addition to the algorithms presented in this thesis, we open-source our code and datasets for the research community to use and explore. We believe that the algorithms and resources provided in this thesis open up exciting new possibilities in the field of 3D spatial perception, and we hope to inspire further research in this area, with the ultimate goal of creating fully autonomous robots that are able to navigate and operate in complex environments. Ph.D. 2023-03-31T14:45:21Z 2023-03-31T14:45:21Z 2023-02 2023-02-15T14:06:01.924Z Thesis https://hdl.handle.net/1721.1/150288 0000-0001-5244-0882 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Rosinol, Antoni 3D Spatial Perception with Real-Time Dense Metric-Semantic SLAM |
title | 3D Spatial Perception with Real-Time Dense Metric-Semantic SLAM |
title_full | 3D Spatial Perception with Real-Time Dense Metric-Semantic SLAM |
title_fullStr | 3D Spatial Perception with Real-Time Dense Metric-Semantic SLAM |
title_full_unstemmed | 3D Spatial Perception with Real-Time Dense Metric-Semantic SLAM |
title_short | 3D Spatial Perception with Real-Time Dense Metric-Semantic SLAM |
title_sort | 3d spatial perception with real time dense metric semantic slam |
url | https://hdl.handle.net/1721.1/150288 |
work_keys_str_mv | AT rosinolantoni 3dspatialperceptionwithrealtimedensemetricsemanticslam |