Hardware Efficient Direct Policy Imitation Learning for Robotic Navigation in Resource-Constrained Settings

Direct policy learning (DPL) is a widely used approach in imitation learning for time-efficient and effective convergence when training mobile robots. However, using DPL in real-world applications is not sufficiently explored due to the inherent challenges of mobilizing direct human expertise and th...

Full description

Bibliographic Details
Main Authors: Vidura Sumanasena, Heshan Fernando, Daswin De Silva, Beniel Thileepan, Amila Pasan, Jayathu Samarawickrama, Evgeny Osipov, Damminda Alahakoon
Format: Article
Language:English
Published: MDPI AG 2023-12-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/24/1/185
_version_ 1797358134767910912
author Vidura Sumanasena
Heshan Fernando
Daswin De Silva
Beniel Thileepan
Amila Pasan
Jayathu Samarawickrama
Evgeny Osipov
Damminda Alahakoon
author_facet Vidura Sumanasena
Heshan Fernando
Daswin De Silva
Beniel Thileepan
Amila Pasan
Jayathu Samarawickrama
Evgeny Osipov
Damminda Alahakoon
author_sort Vidura Sumanasena
collection DOAJ
description Direct policy learning (DPL) is a widely used approach in imitation learning for time-efficient and effective convergence when training mobile robots. However, using DPL in real-world applications is not sufficiently explored due to the inherent challenges of mobilizing direct human expertise and the difficulty of measuring comparative performance. Furthermore, autonomous systems are often resource-constrained, thereby limiting the potential application and implementation of highly effective deep learning models. In this work, we present a lightweight DPL-based approach to train mobile robots in navigational tasks. We integrated a safety policy alongside the navigational policy to safeguard the robot and the environment. The approach was evaluated in simulations and real-world settings and compared with recent work in this space. The results of these experiments and the efficient transfer from simulations to real-world settings demonstrate that our approach has improved performance compared to its hardware-intensive counterparts. We show that using the proposed methodology, the training agent achieves closer performance to the expert within the first 15 training iterations in simulation and real-world settings.
first_indexed 2024-03-08T14:57:28Z
format Article
id doaj.art-830b7d31c4ac4399be05fc3442e3df79
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-08T14:57:28Z
publishDate 2023-12-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-830b7d31c4ac4399be05fc3442e3df792024-01-10T15:08:57ZengMDPI AGSensors1424-82202023-12-0124118510.3390/s24010185Hardware Efficient Direct Policy Imitation Learning for Robotic Navigation in Resource-Constrained SettingsVidura Sumanasena0Heshan Fernando1Daswin De Silva2Beniel Thileepan3Amila Pasan4Jayathu Samarawickrama5Evgeny Osipov6Damminda Alahakoon7Research Centre for Data Analytics and Cognition, La Trobe University, Bundoora, VIC 3083, AustraliaDepartment of Computer Engineering, Rensselaer Polytechnic Institute, New York, NY 12180, USAResearch Centre for Data Analytics and Cognition, La Trobe University, Bundoora, VIC 3083, AustraliaDepartment of Computer Science, University of Warwick, Coventry CV4 7AL, UKCentre for Wireless Communications, University of Oulu, 90570 Oulu, FinlandDepartment of Electronic and Telecom Engineering, University of Moratuwa, Moratuwa 10400, Sri LankaDepartment of Computer Science, Electrical and Space Engineering, Luleå University of Technology, 97187 Luleå, SwedenResearch Centre for Data Analytics and Cognition, La Trobe University, Bundoora, VIC 3083, AustraliaDirect policy learning (DPL) is a widely used approach in imitation learning for time-efficient and effective convergence when training mobile robots. However, using DPL in real-world applications is not sufficiently explored due to the inherent challenges of mobilizing direct human expertise and the difficulty of measuring comparative performance. Furthermore, autonomous systems are often resource-constrained, thereby limiting the potential application and implementation of highly effective deep learning models. In this work, we present a lightweight DPL-based approach to train mobile robots in navigational tasks. We integrated a safety policy alongside the navigational policy to safeguard the robot and the environment. The approach was evaluated in simulations and real-world settings and compared with recent work in this space. The results of these experiments and the efficient transfer from simulations to real-world settings demonstrate that our approach has improved performance compared to its hardware-intensive counterparts. We show that using the proposed methodology, the training agent achieves closer performance to the expert within the first 15 training iterations in simulation and real-world settings.https://www.mdpi.com/1424-8220/24/1/185imitation learningdirect policy learningautonomous navigationmobile robots
spellingShingle Vidura Sumanasena
Heshan Fernando
Daswin De Silva
Beniel Thileepan
Amila Pasan
Jayathu Samarawickrama
Evgeny Osipov
Damminda Alahakoon
Hardware Efficient Direct Policy Imitation Learning for Robotic Navigation in Resource-Constrained Settings
Sensors
imitation learning
direct policy learning
autonomous navigation
mobile robots
title Hardware Efficient Direct Policy Imitation Learning for Robotic Navigation in Resource-Constrained Settings
title_full Hardware Efficient Direct Policy Imitation Learning for Robotic Navigation in Resource-Constrained Settings
title_fullStr Hardware Efficient Direct Policy Imitation Learning for Robotic Navigation in Resource-Constrained Settings
title_full_unstemmed Hardware Efficient Direct Policy Imitation Learning for Robotic Navigation in Resource-Constrained Settings
title_short Hardware Efficient Direct Policy Imitation Learning for Robotic Navigation in Resource-Constrained Settings
title_sort hardware efficient direct policy imitation learning for robotic navigation in resource constrained settings
topic imitation learning
direct policy learning
autonomous navigation
mobile robots
url https://www.mdpi.com/1424-8220/24/1/185
work_keys_str_mv AT vidurasumanasena hardwareefficientdirectpolicyimitationlearningforroboticnavigationinresourceconstrainedsettings
AT heshanfernando hardwareefficientdirectpolicyimitationlearningforroboticnavigationinresourceconstrainedsettings
AT daswindesilva hardwareefficientdirectpolicyimitationlearningforroboticnavigationinresourceconstrainedsettings
AT benielthileepan hardwareefficientdirectpolicyimitationlearningforroboticnavigationinresourceconstrainedsettings
AT amilapasan hardwareefficientdirectpolicyimitationlearningforroboticnavigationinresourceconstrainedsettings
AT jayathusamarawickrama hardwareefficientdirectpolicyimitationlearningforroboticnavigationinresourceconstrainedsettings
AT evgenyosipov hardwareefficientdirectpolicyimitationlearningforroboticnavigationinresourceconstrainedsettings
AT dammindaalahakoon hardwareefficientdirectpolicyimitationlearningforroboticnavigationinresourceconstrainedsettings