Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient

When a traditional Deep Deterministic Policy Gradient (DDPG) algorithm is used in mobile robot path planning, due to the limited observable environment of mobile robots, the training efficiency of the path planning model is low, and the convergence speed is slow. In this paper, Long Short-Term Memor...

Full description

Bibliographic Details
Main Authors: Hui Gong, Peng Wang, Cui Ni, Nuo Cheng
Format: Article
Language:English
Published: MDPI AG 2022-05-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/22/9/3579
_version_ 1797502716909453312
author Hui Gong
Peng Wang
Cui Ni
Nuo Cheng
author_facet Hui Gong
Peng Wang
Cui Ni
Nuo Cheng
author_sort Hui Gong
collection DOAJ
description When a traditional Deep Deterministic Policy Gradient (DDPG) algorithm is used in mobile robot path planning, due to the limited observable environment of mobile robots, the training efficiency of the path planning model is low, and the convergence speed is slow. In this paper, Long Short-Term Memory (LSTM) is introduced into the DDPG network, the former and current states of the mobile robot are combined to determine the actions of the robot, and a Batch Norm layer is added after each layer of the Actor network. At the same time, the reward function is optimized to guide the mobile robot to move faster towards the target point. In order to improve the learning efficiency, different normalization methods are used to normalize the distance and angle between the mobile robot and the target point, which are used as the input of the DDPG network model. When the model outputs the next action of the mobile robot, mixed noise composed of Gaussian noise and Ornstein–Uhlenbeck (OU) noise is added. Finally, the simulation environment built by a ROS system and a Gazebo platform is used for experiments. The results show that the proposed algorithm can accelerate the convergence speed of DDPG, improve the generalization ability of the path planning model and improve the efficiency and success rate of mobile robot path planning.
first_indexed 2024-03-10T03:40:05Z
format Article
id doaj.art-6a2b89498106444385945f02675b2e65
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-10T03:40:05Z
publishDate 2022-05-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-6a2b89498106444385945f02675b2e652023-11-23T09:20:34ZengMDPI AGSensors1424-82202022-05-01229357910.3390/s22093579Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy GradientHui Gong0Peng Wang1Cui Ni2Nuo Cheng3Information Science and Electrical Engineering, Shandong Jiao Tong University, Jinan 250357, ChinaInformation Science and Electrical Engineering, Shandong Jiao Tong University, Jinan 250357, ChinaInformation Science and Electrical Engineering, Shandong Jiao Tong University, Jinan 250357, ChinaInformation Science and Electrical Engineering, Shandong Jiao Tong University, Jinan 250357, ChinaWhen a traditional Deep Deterministic Policy Gradient (DDPG) algorithm is used in mobile robot path planning, due to the limited observable environment of mobile robots, the training efficiency of the path planning model is low, and the convergence speed is slow. In this paper, Long Short-Term Memory (LSTM) is introduced into the DDPG network, the former and current states of the mobile robot are combined to determine the actions of the robot, and a Batch Norm layer is added after each layer of the Actor network. At the same time, the reward function is optimized to guide the mobile robot to move faster towards the target point. In order to improve the learning efficiency, different normalization methods are used to normalize the distance and angle between the mobile robot and the target point, which are used as the input of the DDPG network model. When the model outputs the next action of the mobile robot, mixed noise composed of Gaussian noise and Ornstein–Uhlenbeck (OU) noise is added. Finally, the simulation environment built by a ROS system and a Gazebo platform is used for experiments. The results show that the proposed algorithm can accelerate the convergence speed of DDPG, improve the generalization ability of the path planning model and improve the efficiency and success rate of mobile robot path planning.https://www.mdpi.com/1424-8220/22/9/3579path planningDDPGLSTMreward functionmixed noise
spellingShingle Hui Gong
Peng Wang
Cui Ni
Nuo Cheng
Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient
Sensors
path planning
DDPG
LSTM
reward function
mixed noise
title Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient
title_full Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient
title_fullStr Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient
title_full_unstemmed Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient
title_short Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient
title_sort efficient path planning for mobile robot based on deep deterministic policy gradient
topic path planning
DDPG
LSTM
reward function
mixed noise
url https://www.mdpi.com/1424-8220/22/9/3579
work_keys_str_mv AT huigong efficientpathplanningformobilerobotbasedondeepdeterministicpolicygradient
AT pengwang efficientpathplanningformobilerobotbasedondeepdeterministicpolicygradient
AT cuini efficientpathplanningformobilerobotbasedondeepdeterministicpolicygradient
AT nuocheng efficientpathplanningformobilerobotbasedondeepdeterministicpolicygradient