MONO-HYDRA: REAL-TIME 3D SCENE GRAPH CONSTRUCTION FROM MONOCULAR CAMERA INPUT WITH IMU

The ability of robots to autonomously navigate through 3D environments depends on their comprehension of spatial concepts, ranging from low-level geometry to high-level semantics, such as objects, places, and buildings. To enable such comprehension, 3D scene graphs have emerged as a robust tool for...

Full description

Bibliographic Details
Main Authors: U. V. B. L. Udugama, G. Vosselman, F. Nex
Format: Article
Language:English
Published: Copernicus Publications 2023-12-01
Series:ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Online Access:https://isprs-annals.copernicus.org/articles/X-1-W1-2023/439/2023/isprs-annals-X-1-W1-2023-439-2023.pdf
_version_ 1797403739982659584
author U. V. B. L. Udugama
G. Vosselman
F. Nex
author_facet U. V. B. L. Udugama
G. Vosselman
F. Nex
author_sort U. V. B. L. Udugama
collection DOAJ
description The ability of robots to autonomously navigate through 3D environments depends on their comprehension of spatial concepts, ranging from low-level geometry to high-level semantics, such as objects, places, and buildings. To enable such comprehension, 3D scene graphs have emerged as a robust tool for representing the environment as a layered graph of concepts and their relationships. However, building these representations using monocular vision systems in real-time remains a difficult task that has not been explored in depth.<br />This paper puts forth a real-time spatial perception system Mono-Hydra, combining a monocular camera and an IMU sensor setup, focusing on indoor scenarios. However, the proposed approach is adaptable to outdoor applications, offering flexibility in its potential uses. The system employs a suite of deep learning algorithms to derive depth and semantics. It uses a robocentric visual-inertial odometry (VIO) algorithm based on square-root information, thereby ensuring consistent visual odometry with an IMU and a monocular camera. This system achieves sub-20 cm error in real-time processing at 15 fps, enabling real-time 3D scene graph construction using a laptop GPU (NVIDIA 3080). This enhances decision-making efficiency and effectiveness in simple camera setups, augmenting robotic system agility. We make Mono-Hydra publicly available at: https://github.com/UAV-Centre-ITC/Mono_Hydra.
first_indexed 2024-03-09T02:42:46Z
format Article
id doaj.art-e5980529c6454ef0b7532150ffa81ced
institution Directory Open Access Journal
issn 2194-9042
2194-9050
language English
last_indexed 2024-03-09T02:42:46Z
publishDate 2023-12-01
publisher Copernicus Publications
record_format Article
series ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
spelling doaj.art-e5980529c6454ef0b7532150ffa81ced2023-12-06T02:07:14ZengCopernicus PublicationsISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences2194-90422194-90502023-12-01X-1-W1-202343944510.5194/isprs-annals-X-1-W1-2023-439-2023MONO-HYDRA: REAL-TIME 3D SCENE GRAPH CONSTRUCTION FROM MONOCULAR CAMERA INPUT WITH IMUU. V. B. L. Udugama0G. Vosselman1F. Nex2Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede, The NetherlandsFaculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede, The NetherlandsFaculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede, The NetherlandsThe ability of robots to autonomously navigate through 3D environments depends on their comprehension of spatial concepts, ranging from low-level geometry to high-level semantics, such as objects, places, and buildings. To enable such comprehension, 3D scene graphs have emerged as a robust tool for representing the environment as a layered graph of concepts and their relationships. However, building these representations using monocular vision systems in real-time remains a difficult task that has not been explored in depth.<br />This paper puts forth a real-time spatial perception system Mono-Hydra, combining a monocular camera and an IMU sensor setup, focusing on indoor scenarios. However, the proposed approach is adaptable to outdoor applications, offering flexibility in its potential uses. The system employs a suite of deep learning algorithms to derive depth and semantics. It uses a robocentric visual-inertial odometry (VIO) algorithm based on square-root information, thereby ensuring consistent visual odometry with an IMU and a monocular camera. This system achieves sub-20 cm error in real-time processing at 15 fps, enabling real-time 3D scene graph construction using a laptop GPU (NVIDIA 3080). This enhances decision-making efficiency and effectiveness in simple camera setups, augmenting robotic system agility. We make Mono-Hydra publicly available at: https://github.com/UAV-Centre-ITC/Mono_Hydra.https://isprs-annals.copernicus.org/articles/X-1-W1-2023/439/2023/isprs-annals-X-1-W1-2023-439-2023.pdf
spellingShingle U. V. B. L. Udugama
G. Vosselman
F. Nex
MONO-HYDRA: REAL-TIME 3D SCENE GRAPH CONSTRUCTION FROM MONOCULAR CAMERA INPUT WITH IMU
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
title MONO-HYDRA: REAL-TIME 3D SCENE GRAPH CONSTRUCTION FROM MONOCULAR CAMERA INPUT WITH IMU
title_full MONO-HYDRA: REAL-TIME 3D SCENE GRAPH CONSTRUCTION FROM MONOCULAR CAMERA INPUT WITH IMU
title_fullStr MONO-HYDRA: REAL-TIME 3D SCENE GRAPH CONSTRUCTION FROM MONOCULAR CAMERA INPUT WITH IMU
title_full_unstemmed MONO-HYDRA: REAL-TIME 3D SCENE GRAPH CONSTRUCTION FROM MONOCULAR CAMERA INPUT WITH IMU
title_short MONO-HYDRA: REAL-TIME 3D SCENE GRAPH CONSTRUCTION FROM MONOCULAR CAMERA INPUT WITH IMU
title_sort mono hydra real time 3d scene graph construction from monocular camera input with imu
url https://isprs-annals.copernicus.org/articles/X-1-W1-2023/439/2023/isprs-annals-X-1-W1-2023-439-2023.pdf
work_keys_str_mv AT uvbludugama monohydrarealtime3dscenegraphconstructionfrommonocularcamerainputwithimu
AT gvosselman monohydrarealtime3dscenegraphconstructionfrommonocularcamerainputwithimu
AT fnex monohydrarealtime3dscenegraphconstructionfrommonocularcamerainputwithimu