A Mapless Local Path Planning Approach Using Deep Reinforcement Learning Framework
The key module for autonomous mobile robots is path planning and obstacle avoidance. Global path planning based on known maps has been effectively achieved. Local path planning in unknown dynamic environments is still very challenging due to the lack of detailed environmental information and unpredi...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-02-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/23/4/2036 |
_version_ | 1827755535804399616 |
---|---|
author | Yan Yin Zhiyu Chen Gang Liu Jianwei Guo |
author_facet | Yan Yin Zhiyu Chen Gang Liu Jianwei Guo |
author_sort | Yan Yin |
collection | DOAJ |
description | The key module for autonomous mobile robots is path planning and obstacle avoidance. Global path planning based on known maps has been effectively achieved. Local path planning in unknown dynamic environments is still very challenging due to the lack of detailed environmental information and unpredictability. This paper proposes an end-to-end local path planner n-step dueling double DQN with reward-based <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>ϵ</mi></semantics></math></inline-formula>-greedy (RND3QN) based on a deep reinforcement learning framework, which acquires environmental data from LiDAR as input and uses a neural network to fit Q-values to output the corresponding discrete actions. The bias is reduced using n-step bootstrapping based on deep Q-network (DQN). The <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>ϵ</mi></semantics></math></inline-formula>-greedy exploration-exploitation strategy is improved with the reward value as a measure of exploration, and an auxiliary reward function is introduced to increase the reward distribution of the sparse reward environment. Simulation experiments are conducted on the gazebo to test the algorithm’s effectiveness. The experimental data demonstrate that the average total reward value of RND3QN is higher than that of algorithms such as dueling double DQN (D3QN), and the success rates are increased by 174%, 65%, and 61% over D3QN on three stages, respectively. We experimented on the turtlebot3 waffle pi robot, and the strategies learned from the simulation can be effectively transferred to the real robot. |
first_indexed | 2024-03-11T08:10:43Z |
format | Article |
id | doaj.art-54afa341bfe443cdbeade45e0b7a9e5d |
institution | Directory Open Access Journal |
issn | 1424-8220 |
language | English |
last_indexed | 2024-03-11T08:10:43Z |
publishDate | 2023-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj.art-54afa341bfe443cdbeade45e0b7a9e5d2023-11-16T23:09:29ZengMDPI AGSensors1424-82202023-02-01234203610.3390/s23042036A Mapless Local Path Planning Approach Using Deep Reinforcement Learning FrameworkYan Yin0Zhiyu Chen1Gang Liu2Jianwei Guo3School of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, ChinaSchool of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, ChinaSchool of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, ChinaSchool of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, ChinaThe key module for autonomous mobile robots is path planning and obstacle avoidance. Global path planning based on known maps has been effectively achieved. Local path planning in unknown dynamic environments is still very challenging due to the lack of detailed environmental information and unpredictability. This paper proposes an end-to-end local path planner n-step dueling double DQN with reward-based <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>ϵ</mi></semantics></math></inline-formula>-greedy (RND3QN) based on a deep reinforcement learning framework, which acquires environmental data from LiDAR as input and uses a neural network to fit Q-values to output the corresponding discrete actions. The bias is reduced using n-step bootstrapping based on deep Q-network (DQN). The <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>ϵ</mi></semantics></math></inline-formula>-greedy exploration-exploitation strategy is improved with the reward value as a measure of exploration, and an auxiliary reward function is introduced to increase the reward distribution of the sparse reward environment. Simulation experiments are conducted on the gazebo to test the algorithm’s effectiveness. The experimental data demonstrate that the average total reward value of RND3QN is higher than that of algorithms such as dueling double DQN (D3QN), and the success rates are increased by 174%, 65%, and 61% over D3QN on three stages, respectively. We experimented on the turtlebot3 waffle pi robot, and the strategies learned from the simulation can be effectively transferred to the real robot.https://www.mdpi.com/1424-8220/23/4/2036D3QNexploration-exploitationturtlebot3n-stepauxiliary reward functionspath planning |
spellingShingle | Yan Yin Zhiyu Chen Gang Liu Jianwei Guo A Mapless Local Path Planning Approach Using Deep Reinforcement Learning Framework Sensors D3QN exploration-exploitation turtlebot3 n-step auxiliary reward functions path planning |
title | A Mapless Local Path Planning Approach Using Deep Reinforcement Learning Framework |
title_full | A Mapless Local Path Planning Approach Using Deep Reinforcement Learning Framework |
title_fullStr | A Mapless Local Path Planning Approach Using Deep Reinforcement Learning Framework |
title_full_unstemmed | A Mapless Local Path Planning Approach Using Deep Reinforcement Learning Framework |
title_short | A Mapless Local Path Planning Approach Using Deep Reinforcement Learning Framework |
title_sort | mapless local path planning approach using deep reinforcement learning framework |
topic | D3QN exploration-exploitation turtlebot3 n-step auxiliary reward functions path planning |
url | https://www.mdpi.com/1424-8220/23/4/2036 |
work_keys_str_mv | AT yanyin amaplesslocalpathplanningapproachusingdeepreinforcementlearningframework AT zhiyuchen amaplesslocalpathplanningapproachusingdeepreinforcementlearningframework AT gangliu amaplesslocalpathplanningapproachusingdeepreinforcementlearningframework AT jianweiguo amaplesslocalpathplanningapproachusingdeepreinforcementlearningframework AT yanyin maplesslocalpathplanningapproachusingdeepreinforcementlearningframework AT zhiyuchen maplesslocalpathplanningapproachusingdeepreinforcementlearningframework AT gangliu maplesslocalpathplanningapproachusingdeepreinforcementlearningframework AT jianweiguo maplesslocalpathplanningapproachusingdeepreinforcementlearningframework |