Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient

When a traditional Deep Deterministic Policy Gradient (DDPG) algorithm is used in mobile robot path planning, due to the limited observable environment of mobile robots, the training efficiency of the path planning model is low, and the convergence speed is slow. In this paper, Long Short-Term Memor...

Full description

Bibliographic Details
Main Authors:	Hui Gong, Peng Wang, Cui Ni, Nuo Cheng
Format:	Article
Language:	English
Published:	MDPI AG 2022-05-01
Series:	Sensors
Subjects:	path planning DDPG LSTM reward function mixed noise
Online Access:	https://www.mdpi.com/1424-8220/22/9/3579

_version_	1797502716909453312
author	Hui Gong Peng Wang Cui Ni Nuo Cheng
author_facet	Hui Gong Peng Wang Cui Ni Nuo Cheng
author_sort	Hui Gong
collection	DOAJ
description	When a traditional Deep Deterministic Policy Gradient (DDPG) algorithm is used in mobile robot path planning, due to the limited observable environment of mobile robots, the training efficiency of the path planning model is low, and the convergence speed is slow. In this paper, Long Short-Term Memory (LSTM) is introduced into the DDPG network, the former and current states of the mobile robot are combined to determine the actions of the robot, and a Batch Norm layer is added after each layer of the Actor network. At the same time, the reward function is optimized to guide the mobile robot to move faster towards the target point. In order to improve the learning efficiency, different normalization methods are used to normalize the distance and angle between the mobile robot and the target point, which are used as the input of the DDPG network model. When the model outputs the next action of the mobile robot, mixed noise composed of Gaussian noise and Ornstein–Uhlenbeck (OU) noise is added. Finally, the simulation environment built by a ROS system and a Gazebo platform is used for experiments. The results show that the proposed algorithm can accelerate the convergence speed of DDPG, improve the generalization ability of the path planning model and improve the efficiency and success rate of mobile robot path planning.
first_indexed	2024-03-10T03:40:05Z
format	Article
id	doaj.art-6a2b89498106444385945f02675b2e65
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-10T03:40:05Z
publishDate	2022-05-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-6a2b89498106444385945f02675b2e652023-11-23T09:20:34ZengMDPI AGSensors1424-82202022-05-01229357910.3390/s22093579Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy GradientHui Gong0Peng Wang1Cui Ni2Nuo Cheng3Information Science and Electrical Engineering, Shandong Jiao Tong University, Jinan 250357, ChinaInformation Science and Electrical Engineering, Shandong Jiao Tong University, Jinan 250357, ChinaInformation Science and Electrical Engineering, Shandong Jiao Tong University, Jinan 250357, ChinaInformation Science and Electrical Engineering, Shandong Jiao Tong University, Jinan 250357, ChinaWhen a traditional Deep Deterministic Policy Gradient (DDPG) algorithm is used in mobile robot path planning, due to the limited observable environment of mobile robots, the training efficiency of the path planning model is low, and the convergence speed is slow. In this paper, Long Short-Term Memory (LSTM) is introduced into the DDPG network, the former and current states of the mobile robot are combined to determine the actions of the robot, and a Batch Norm layer is added after each layer of the Actor network. At the same time, the reward function is optimized to guide the mobile robot to move faster towards the target point. In order to improve the learning efficiency, different normalization methods are used to normalize the distance and angle between the mobile robot and the target point, which are used as the input of the DDPG network model. When the model outputs the next action of the mobile robot, mixed noise composed of Gaussian noise and Ornstein–Uhlenbeck (OU) noise is added. Finally, the simulation environment built by a ROS system and a Gazebo platform is used for experiments. The results show that the proposed algorithm can accelerate the convergence speed of DDPG, improve the generalization ability of the path planning model and improve the efficiency and success rate of mobile robot path planning.https://www.mdpi.com/1424-8220/22/9/3579path planningDDPGLSTMreward functionmixed noise
spellingShingle	Hui Gong Peng Wang Cui Ni Nuo Cheng Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient Sensors path planning DDPG LSTM reward function mixed noise
title	Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient
title_full	Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient
title_fullStr	Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient
title_full_unstemmed	Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient
title_short	Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient
title_sort	efficient path planning for mobile robot based on deep deterministic policy gradient
topic	path planning DDPG LSTM reward function mixed noise
url	https://www.mdpi.com/1424-8220/22/9/3579
work_keys_str_mv	AT huigong efficientpathplanningformobilerobotbasedondeepdeterministicpolicygradient AT pengwang efficientpathplanningformobilerobotbasedondeepdeterministicpolicygradient AT cuini efficientpathplanningformobilerobotbasedondeepdeterministicpolicygradient AT nuocheng efficientpathplanningformobilerobotbasedondeepdeterministicpolicygradient

Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient

Similar Items