Interactive Semantic Map Representation for Skill-Based Visual Object Navigation

Visual object navigation is one of the key tasks in mobile robotics. One of the most important components of this task is the accurate semantic representation of the scene, which is needed to determine and reach a goal object. This paper introduces a new representation of a scene semantic map formed...

Full description

Bibliographic Details
Main Authors: Tatiana Zemskova, Aleksei Staroverov, Kirill Muravyev, Dmitry A. Yudin, Aleksandr I. Panov
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10477345/
_version_ 1797231183541567488
author Tatiana Zemskova
Aleksei Staroverov
Kirill Muravyev
Dmitry A. Yudin
Aleksandr I. Panov
author_facet Tatiana Zemskova
Aleksei Staroverov
Kirill Muravyev
Dmitry A. Yudin
Aleksandr I. Panov
author_sort Tatiana Zemskova
collection DOAJ
description Visual object navigation is one of the key tasks in mobile robotics. One of the most important components of this task is the accurate semantic representation of the scene, which is needed to determine and reach a goal object. This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment. It is based on a neural network method that adjusts the weights of the segmentation model with backpropagation of the predicted fusion loss values during inference on a regular (backward) or delayed (forward) image sequence. We implement this representation into a full-fledged navigation approach called SkillTron. The method can select robot skills from end-to-end policies based on reinforcement learning and classic map-based planning methods. The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation. We conduct intensive experiments with the proposed approach in the Habitat environment, demonstrating its significant superiority over state-of-the-art approaches in terms of navigation quality metrics. The developed code and custom datasets are publicly available at github.com/AIRI-Institute/ skill-fusion.
first_indexed 2024-04-24T15:40:20Z
format Article
id doaj.art-1a6d61f320414488aecae6bbed9623f4
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-24T15:40:20Z
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-1a6d61f320414488aecae6bbed9623f42024-04-01T23:00:21ZengIEEEIEEE Access2169-35362024-01-0112446284463910.1109/ACCESS.2024.338045010477345Interactive Semantic Map Representation for Skill-Based Visual Object NavigationTatiana Zemskova0https://orcid.org/0000-0003-4271-7336Aleksei Staroverov1https://orcid.org/0000-0002-4730-1543Kirill Muravyev2https://orcid.org/0000-0001-5897-0702Dmitry A. Yudin3Aleksandr I. Panov4https://orcid.org/0000-0002-9747-3837Artificial Intelligence Research Institute (AIRI), Moscow, RussiaArtificial Intelligence Research Institute (AIRI), Moscow, RussiaFederal Research Center “Computer Science and Control,”, Moscow, RussiaArtificial Intelligence Research Institute (AIRI), Moscow, RussiaArtificial Intelligence Research Institute (AIRI), Moscow, RussiaVisual object navigation is one of the key tasks in mobile robotics. One of the most important components of this task is the accurate semantic representation of the scene, which is needed to determine and reach a goal object. This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment. It is based on a neural network method that adjusts the weights of the segmentation model with backpropagation of the predicted fusion loss values during inference on a regular (backward) or delayed (forward) image sequence. We implement this representation into a full-fledged navigation approach called SkillTron. The method can select robot skills from end-to-end policies based on reinforcement learning and classic map-based planning methods. The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation. We conduct intensive experiments with the proposed approach in the Habitat environment, demonstrating its significant superiority over state-of-the-art approaches in terms of navigation quality metrics. The developed code and custom datasets are publicly available at github.com/AIRI-Institute/ skill-fusion.https://ieeexplore.ieee.org/document/10477345/Semantic mapnavigationroboticsreinforcement learningfrontier-based exploration
spellingShingle Tatiana Zemskova
Aleksei Staroverov
Kirill Muravyev
Dmitry A. Yudin
Aleksandr I. Panov
Interactive Semantic Map Representation for Skill-Based Visual Object Navigation
IEEE Access
Semantic map
navigation
robotics
reinforcement learning
frontier-based exploration
title Interactive Semantic Map Representation for Skill-Based Visual Object Navigation
title_full Interactive Semantic Map Representation for Skill-Based Visual Object Navigation
title_fullStr Interactive Semantic Map Representation for Skill-Based Visual Object Navigation
title_full_unstemmed Interactive Semantic Map Representation for Skill-Based Visual Object Navigation
title_short Interactive Semantic Map Representation for Skill-Based Visual Object Navigation
title_sort interactive semantic map representation for skill based visual object navigation
topic Semantic map
navigation
robotics
reinforcement learning
frontier-based exploration
url https://ieeexplore.ieee.org/document/10477345/
work_keys_str_mv AT tatianazemskova interactivesemanticmaprepresentationforskillbasedvisualobjectnavigation
AT alekseistaroverov interactivesemanticmaprepresentationforskillbasedvisualobjectnavigation
AT kirillmuravyev interactivesemanticmaprepresentationforskillbasedvisualobjectnavigation
AT dmitryayudin interactivesemanticmaprepresentationforskillbasedvisualobjectnavigation
AT aleksandripanov interactivesemanticmaprepresentationforskillbasedvisualobjectnavigation