MONO-HYDRA: REAL-TIME 3D SCENE GRAPH CONSTRUCTION FROM MONOCULAR CAMERA INPUT WITH IMU
The ability of robots to autonomously navigate through 3D environments depends on their comprehension of spatial concepts, ranging from low-level geometry to high-level semantics, such as objects, places, and buildings. To enable such comprehension, 3D scene graphs have emerged as a robust tool for...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Copernicus Publications
2023-12-01
|
Series: | ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences |
Online Access: | https://isprs-annals.copernicus.org/articles/X-1-W1-2023/439/2023/isprs-annals-X-1-W1-2023-439-2023.pdf |
_version_ | 1797403739982659584 |
---|---|
author | U. V. B. L. Udugama G. Vosselman F. Nex |
author_facet | U. V. B. L. Udugama G. Vosselman F. Nex |
author_sort | U. V. B. L. Udugama |
collection | DOAJ |
description | The ability of robots to autonomously navigate through 3D environments depends on their comprehension of spatial concepts, ranging from low-level geometry to high-level semantics, such as objects, places, and buildings. To enable such comprehension, 3D scene graphs have emerged as a robust tool for representing the environment as a layered graph of concepts and their relationships. However, building these representations using monocular vision systems in real-time remains a difficult task that has not been explored in depth.<br />This paper puts forth a real-time spatial perception system Mono-Hydra, combining a monocular camera and an IMU sensor setup, focusing on indoor scenarios. However, the proposed approach is adaptable to outdoor applications, offering flexibility in its potential uses. The system employs a suite of deep learning algorithms to derive depth and semantics. It uses a robocentric visual-inertial odometry (VIO) algorithm based on square-root information, thereby ensuring consistent visual odometry with an IMU and a monocular camera. This system achieves sub-20 cm error in real-time processing at 15 fps, enabling real-time 3D scene graph construction using a laptop GPU (NVIDIA 3080). This enhances decision-making efficiency and effectiveness in simple camera setups, augmenting robotic system agility. We make Mono-Hydra publicly available at: https://github.com/UAV-Centre-ITC/Mono_Hydra. |
first_indexed | 2024-03-09T02:42:46Z |
format | Article |
id | doaj.art-e5980529c6454ef0b7532150ffa81ced |
institution | Directory Open Access Journal |
issn | 2194-9042 2194-9050 |
language | English |
last_indexed | 2024-03-09T02:42:46Z |
publishDate | 2023-12-01 |
publisher | Copernicus Publications |
record_format | Article |
series | ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences |
spelling | doaj.art-e5980529c6454ef0b7532150ffa81ced2023-12-06T02:07:14ZengCopernicus PublicationsISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences2194-90422194-90502023-12-01X-1-W1-202343944510.5194/isprs-annals-X-1-W1-2023-439-2023MONO-HYDRA: REAL-TIME 3D SCENE GRAPH CONSTRUCTION FROM MONOCULAR CAMERA INPUT WITH IMUU. V. B. L. Udugama0G. Vosselman1F. Nex2Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede, The NetherlandsFaculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede, The NetherlandsFaculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede, The NetherlandsThe ability of robots to autonomously navigate through 3D environments depends on their comprehension of spatial concepts, ranging from low-level geometry to high-level semantics, such as objects, places, and buildings. To enable such comprehension, 3D scene graphs have emerged as a robust tool for representing the environment as a layered graph of concepts and their relationships. However, building these representations using monocular vision systems in real-time remains a difficult task that has not been explored in depth.<br />This paper puts forth a real-time spatial perception system Mono-Hydra, combining a monocular camera and an IMU sensor setup, focusing on indoor scenarios. However, the proposed approach is adaptable to outdoor applications, offering flexibility in its potential uses. The system employs a suite of deep learning algorithms to derive depth and semantics. It uses a robocentric visual-inertial odometry (VIO) algorithm based on square-root information, thereby ensuring consistent visual odometry with an IMU and a monocular camera. This system achieves sub-20 cm error in real-time processing at 15 fps, enabling real-time 3D scene graph construction using a laptop GPU (NVIDIA 3080). This enhances decision-making efficiency and effectiveness in simple camera setups, augmenting robotic system agility. We make Mono-Hydra publicly available at: https://github.com/UAV-Centre-ITC/Mono_Hydra.https://isprs-annals.copernicus.org/articles/X-1-W1-2023/439/2023/isprs-annals-X-1-W1-2023-439-2023.pdf |
spellingShingle | U. V. B. L. Udugama G. Vosselman F. Nex MONO-HYDRA: REAL-TIME 3D SCENE GRAPH CONSTRUCTION FROM MONOCULAR CAMERA INPUT WITH IMU ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences |
title | MONO-HYDRA: REAL-TIME 3D SCENE GRAPH CONSTRUCTION FROM MONOCULAR CAMERA INPUT WITH IMU |
title_full | MONO-HYDRA: REAL-TIME 3D SCENE GRAPH CONSTRUCTION FROM MONOCULAR CAMERA INPUT WITH IMU |
title_fullStr | MONO-HYDRA: REAL-TIME 3D SCENE GRAPH CONSTRUCTION FROM MONOCULAR CAMERA INPUT WITH IMU |
title_full_unstemmed | MONO-HYDRA: REAL-TIME 3D SCENE GRAPH CONSTRUCTION FROM MONOCULAR CAMERA INPUT WITH IMU |
title_short | MONO-HYDRA: REAL-TIME 3D SCENE GRAPH CONSTRUCTION FROM MONOCULAR CAMERA INPUT WITH IMU |
title_sort | mono hydra real time 3d scene graph construction from monocular camera input with imu |
url | https://isprs-annals.copernicus.org/articles/X-1-W1-2023/439/2023/isprs-annals-X-1-W1-2023-439-2023.pdf |
work_keys_str_mv | AT uvbludugama monohydrarealtime3dscenegraphconstructionfrommonocularcamerainputwithimu AT gvosselman monohydrarealtime3dscenegraphconstructionfrommonocularcamerainputwithimu AT fnex monohydrarealtime3dscenegraphconstructionfrommonocularcamerainputwithimu |