Interactive Semantic Map Representation for Skill-Based Visual Object Navigation
Visual object navigation is one of the key tasks in mobile robotics. One of the most important components of this task is the accurate semantic representation of the scene, which is needed to determine and reach a goal object. This paper introduces a new representation of a scene semantic map formed...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10477345/ |
_version_ | 1797231183541567488 |
---|---|
author | Tatiana Zemskova Aleksei Staroverov Kirill Muravyev Dmitry A. Yudin Aleksandr I. Panov |
author_facet | Tatiana Zemskova Aleksei Staroverov Kirill Muravyev Dmitry A. Yudin Aleksandr I. Panov |
author_sort | Tatiana Zemskova |
collection | DOAJ |
description | Visual object navigation is one of the key tasks in mobile robotics. One of the most important components of this task is the accurate semantic representation of the scene, which is needed to determine and reach a goal object. This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment. It is based on a neural network method that adjusts the weights of the segmentation model with backpropagation of the predicted fusion loss values during inference on a regular (backward) or delayed (forward) image sequence. We implement this representation into a full-fledged navigation approach called SkillTron. The method can select robot skills from end-to-end policies based on reinforcement learning and classic map-based planning methods. The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation. We conduct intensive experiments with the proposed approach in the Habitat environment, demonstrating its significant superiority over state-of-the-art approaches in terms of navigation quality metrics. The developed code and custom datasets are publicly available at github.com/AIRI-Institute/ skill-fusion. |
first_indexed | 2024-04-24T15:40:20Z |
format | Article |
id | doaj.art-1a6d61f320414488aecae6bbed9623f4 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-24T15:40:20Z |
publishDate | 2024-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-1a6d61f320414488aecae6bbed9623f42024-04-01T23:00:21ZengIEEEIEEE Access2169-35362024-01-0112446284463910.1109/ACCESS.2024.338045010477345Interactive Semantic Map Representation for Skill-Based Visual Object NavigationTatiana Zemskova0https://orcid.org/0000-0003-4271-7336Aleksei Staroverov1https://orcid.org/0000-0002-4730-1543Kirill Muravyev2https://orcid.org/0000-0001-5897-0702Dmitry A. Yudin3Aleksandr I. Panov4https://orcid.org/0000-0002-9747-3837Artificial Intelligence Research Institute (AIRI), Moscow, RussiaArtificial Intelligence Research Institute (AIRI), Moscow, RussiaFederal Research Center “Computer Science and Control,”, Moscow, RussiaArtificial Intelligence Research Institute (AIRI), Moscow, RussiaArtificial Intelligence Research Institute (AIRI), Moscow, RussiaVisual object navigation is one of the key tasks in mobile robotics. One of the most important components of this task is the accurate semantic representation of the scene, which is needed to determine and reach a goal object. This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment. It is based on a neural network method that adjusts the weights of the segmentation model with backpropagation of the predicted fusion loss values during inference on a regular (backward) or delayed (forward) image sequence. We implement this representation into a full-fledged navigation approach called SkillTron. The method can select robot skills from end-to-end policies based on reinforcement learning and classic map-based planning methods. The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation. We conduct intensive experiments with the proposed approach in the Habitat environment, demonstrating its significant superiority over state-of-the-art approaches in terms of navigation quality metrics. The developed code and custom datasets are publicly available at github.com/AIRI-Institute/ skill-fusion.https://ieeexplore.ieee.org/document/10477345/Semantic mapnavigationroboticsreinforcement learningfrontier-based exploration |
spellingShingle | Tatiana Zemskova Aleksei Staroverov Kirill Muravyev Dmitry A. Yudin Aleksandr I. Panov Interactive Semantic Map Representation for Skill-Based Visual Object Navigation IEEE Access Semantic map navigation robotics reinforcement learning frontier-based exploration |
title | Interactive Semantic Map Representation for Skill-Based Visual Object Navigation |
title_full | Interactive Semantic Map Representation for Skill-Based Visual Object Navigation |
title_fullStr | Interactive Semantic Map Representation for Skill-Based Visual Object Navigation |
title_full_unstemmed | Interactive Semantic Map Representation for Skill-Based Visual Object Navigation |
title_short | Interactive Semantic Map Representation for Skill-Based Visual Object Navigation |
title_sort | interactive semantic map representation for skill based visual object navigation |
topic | Semantic map navigation robotics reinforcement learning frontier-based exploration |
url | https://ieeexplore.ieee.org/document/10477345/ |
work_keys_str_mv | AT tatianazemskova interactivesemanticmaprepresentationforskillbasedvisualobjectnavigation AT alekseistaroverov interactivesemanticmaprepresentationforskillbasedvisualobjectnavigation AT kirillmuravyev interactivesemanticmaprepresentationforskillbasedvisualobjectnavigation AT dmitryayudin interactivesemanticmaprepresentationforskillbasedvisualobjectnavigation AT aleksandripanov interactivesemanticmaprepresentationforskillbasedvisualobjectnavigation |