3D Spatial Perception with Real-Time Dense Metric-Semantic SLAM

3D Spatial Perception is the ability of an agent to perceive and understand the three-dimensional structure of its environment, including its position and orientation within that environment. This ability is essential for autonomous robots to navigate and interact with their surroundings, since it e...

Full description

Bibliographic Details
Main Author: Rosinol, Antoni
Other Authors: Carlone, Luca
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/150288
_version_ 1826202051605954560
author Rosinol, Antoni
author2 Carlone, Luca
author_facet Carlone, Luca
Rosinol, Antoni
author_sort Rosinol, Antoni
collection MIT
description 3D Spatial Perception is the ability of an agent to perceive and understand the three-dimensional structure of its environment, including its position and orientation within that environment. This ability is essential for autonomous robots to navigate and interact with their surroundings, since it enables robots to perform a wide variety of tasks, such as obstacle avoidance, path planning, and object manipulation. To provide robots with a detailed and accurate representation of the surrounding environment, this thesis first proposes the use of a map representation that is geometrically dense, photometrically accurate, and semantically annotated. We define these maps as metric-semantic maps, and provide algorithms to build such maps in real-time. Metric-semantic maps allow both humans and robots to have a shared understanding of the scene, while providing the robot with sufficient information to localize, plan shortest paths, and avoid obstacles along the way. We then present a novel 3D representation that abstracts a dense metric-semantic map into higher-level concepts – such as rooms, corridors, and buildings – and also encodes static objects and dynamic entities. We define such representations as 3D Dynamic Scene Graphs (DSGs), and provide as well algorithms to build 3D DSGs. Finally, we show how these approaches can be combined to form a Spatial Perception Engine capable of building both metric-semantic maps and 3D DSGs from visual and inertial data. We also demonstrate the effectiveness of 3D DSGs for fast semantic path-planning queries, which can be used to direct robots using natural language commands. In addition to the algorithms presented in this thesis, we open-source our code and datasets for the research community to use and explore. We believe that the algorithms and resources provided in this thesis open up exciting new possibilities in the field of 3D spatial perception, and we hope to inspire further research in this area, with the ultimate goal of creating fully autonomous robots that are able to navigate and operate in complex environments.
first_indexed 2024-09-23T12:01:06Z
format Thesis
id mit-1721.1/150288
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T12:01:06Z
publishDate 2023
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1502882023-04-01T03:35:19Z 3D Spatial Perception with Real-Time Dense Metric-Semantic SLAM Rosinol, Antoni Carlone, Luca Leonard, John J. Massachusetts Institute of Technology. Department of Aeronautics and Astronautics 3D Spatial Perception is the ability of an agent to perceive and understand the three-dimensional structure of its environment, including its position and orientation within that environment. This ability is essential for autonomous robots to navigate and interact with their surroundings, since it enables robots to perform a wide variety of tasks, such as obstacle avoidance, path planning, and object manipulation. To provide robots with a detailed and accurate representation of the surrounding environment, this thesis first proposes the use of a map representation that is geometrically dense, photometrically accurate, and semantically annotated. We define these maps as metric-semantic maps, and provide algorithms to build such maps in real-time. Metric-semantic maps allow both humans and robots to have a shared understanding of the scene, while providing the robot with sufficient information to localize, plan shortest paths, and avoid obstacles along the way. We then present a novel 3D representation that abstracts a dense metric-semantic map into higher-level concepts – such as rooms, corridors, and buildings – and also encodes static objects and dynamic entities. We define such representations as 3D Dynamic Scene Graphs (DSGs), and provide as well algorithms to build 3D DSGs. Finally, we show how these approaches can be combined to form a Spatial Perception Engine capable of building both metric-semantic maps and 3D DSGs from visual and inertial data. We also demonstrate the effectiveness of 3D DSGs for fast semantic path-planning queries, which can be used to direct robots using natural language commands. In addition to the algorithms presented in this thesis, we open-source our code and datasets for the research community to use and explore. We believe that the algorithms and resources provided in this thesis open up exciting new possibilities in the field of 3D spatial perception, and we hope to inspire further research in this area, with the ultimate goal of creating fully autonomous robots that are able to navigate and operate in complex environments. Ph.D. 2023-03-31T14:45:21Z 2023-03-31T14:45:21Z 2023-02 2023-02-15T14:06:01.924Z Thesis https://hdl.handle.net/1721.1/150288 0000-0001-5244-0882 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Rosinol, Antoni
3D Spatial Perception with Real-Time Dense Metric-Semantic SLAM
title 3D Spatial Perception with Real-Time Dense Metric-Semantic SLAM
title_full 3D Spatial Perception with Real-Time Dense Metric-Semantic SLAM
title_fullStr 3D Spatial Perception with Real-Time Dense Metric-Semantic SLAM
title_full_unstemmed 3D Spatial Perception with Real-Time Dense Metric-Semantic SLAM
title_short 3D Spatial Perception with Real-Time Dense Metric-Semantic SLAM
title_sort 3d spatial perception with real time dense metric semantic slam
url https://hdl.handle.net/1721.1/150288
work_keys_str_mv AT rosinolantoni 3dspatialperceptionwithrealtimedensemetricsemanticslam