Long-term Object-based SLAM in Low-dynamic Environments

Simultaneous Localization and Mapping (SLAM) is fundamental for autonomous agents to understand their surroundings. Moreover, for advanced robotic tasks, engaging in consistent object-level reasoning is critical, especially for activities involving repetitive traversal within the same environment, s...

Full description

Bibliographic Details
Main Author: Fu, Jiahui
Other Authors: Leonard, John J.
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/153705
Description
Summary:Simultaneous Localization and Mapping (SLAM) is fundamental for autonomous agents to understand their surroundings. Moreover, for advanced robotic tasks, engaging in consistent object-level reasoning is critical, especially for activities involving repetitive traversal within the same environment, such as household cleaning and object retrieval. In a changing world, robots should always locate themselves and their targets while maintaining an updated environment map. Traditional SLAM relies on static geometric primitives from observations, lacking semantic understanding. These unordered sets of points, lines, or planes struggle with object-level interpretation, leading to false estimation against scene changes. As the world functions and evolves under the minimal unit of objects, object-aided SLAM is a logical option. This thesis revolves around long-term object-based SLAM within low-dynamic environments to bridge the communication gap between SLAM techniques and high-level robotic applications and enhance SLAM compatibility with object-level perception. It presents three contributions: First, we propose a multi-hypothesis approach for the ambiguity-aware adoption of object poses in object-based SLAM. This approach accommodates the inherent ambiguity arising from occlusion or symmetrical object shapes. We design a multi-hypothesis object pose estimator front end in a mixture-of-expert fashion and utilize a max-mixture-based back end to infer globally consistent camera and object poses from a sequence of pose hypothesis sets. Second, we develop two change detection approaches for offline and online applications, with two novel scene and object representations, PlaneSDF and shape-consistent neural descriptor fields, respectively. Regarding long-term operation, we account for inevitable scene changes over extended periods and the efficiency and scalability of the chosen map representations. Furthermore, we explore cluster- and object-level change detection, following a "divide-and-conquer" strategy to enable more accurate and flexible change detection through local scene differencing. Last, we propose a neural SE(3)-equivariant object embedding (NeuSE) for long-term consistent spatial understanding in object-based SLAM. NeuSE is trained to serve as a compact point cloud surrogate for complete object models. Our NeuSE-based object SLAM paradigm directly derives SE(3) camera pose constraints compatible with general SLAM pose graph optimization. This realizes object-assisted localization and a lightweight object-centric map with change-aware mapping ability, ultimately achieving robust scene understanding despite temporal environment changes.