DM-DQN: Dueling Munchausen deep Q network for robot path planning

Abstract In order to achieve collision-free path planning in complex environment, Munchausen deep Q-learning network (M-DQN) is applied to mobile robot to learn the best decision. On the basis of Soft-DQN, M-DQN adds the scaled log-policy to the immediate reward. The method allows agent to do more e...

Full description

Bibliographic Details
Main Authors:	Yuwan Gu, Zhitao Zhu, Jidong Lv, Lin Shi, Zhenjie Hou, Shoukun Xu
Format:	Article
Language:	English
Published:	Springer 2022-12-01
Series:	Complex & Intelligent Systems
Subjects:	Deep reinforcement learning DM-DQN Path planning Dueling network
Online Access:	https://doi.org/10.1007/s40747-022-00948-7

_version_	1827890164873035776
author	Yuwan Gu Zhitao Zhu Jidong Lv Lin Shi Zhenjie Hou Shoukun Xu
author_facet	Yuwan Gu Zhitao Zhu Jidong Lv Lin Shi Zhenjie Hou Shoukun Xu
author_sort	Yuwan Gu
collection	DOAJ
description	Abstract In order to achieve collision-free path planning in complex environment, Munchausen deep Q-learning network (M-DQN) is applied to mobile robot to learn the best decision. On the basis of Soft-DQN, M-DQN adds the scaled log-policy to the immediate reward. The method allows agent to do more exploration. However, the M-DQN algorithm has the problem of slow convergence. A new and improved M-DQN algorithm (DM-DQN) is proposed in the paper to address the problem. First, its network structure was improved on the basis of M-DQN by decomposing the network structure into a value function and an advantage function, thus decoupling action selection and action evaluation and speeding up its convergence, giving it better generalization performance and enabling it to learn the best decision faster. Second, to address the problem of the robot’s trajectory being too close to the edge of the obstacle, a method of using an artificial potential field to set a reward function is proposed to drive the robot’s trajectory away from the vicinity of the obstacle. The result of simulation experiment shows that the method learns more efficiently and converges faster than DQN, Dueling DQN and M-DQN in both static and dynamic environments, and is able to plan collision-free paths away from obstacles.
first_indexed	2024-03-12T21:05:48Z
format	Article
id	doaj.art-32168163282b494a9a606fb8cb9fd303
institution	Directory Open Access Journal
issn	2199-4536 2198-6053
language	English
last_indexed	2024-03-12T21:05:48Z
publishDate	2022-12-01
publisher	Springer
record_format	Article
series	Complex & Intelligent Systems
spelling	doaj.art-32168163282b494a9a606fb8cb9fd3032023-07-30T11:27:59ZengSpringerComplex & Intelligent Systems2199-45362198-60532022-12-01944287430010.1007/s40747-022-00948-7DM-DQN: Dueling Munchausen deep Q network for robot path planningYuwan Gu0Zhitao Zhu1Jidong Lv2Lin Shi3Zhenjie Hou4Shoukun Xu5School of Computer Science and Artificial Intelligence, Changzhou UniversitySchool of Computer Science and Artificial Intelligence, Changzhou UniversitySchool of Computer Science and Artificial Intelligence, Changzhou UniversitySchool of Computer Science and Artificial Intelligence, Changzhou UniversitySchool of Computer Science and Artificial Intelligence, Changzhou UniversitySchool of Computer Science and Artificial Intelligence, Changzhou UniversityAbstract In order to achieve collision-free path planning in complex environment, Munchausen deep Q-learning network (M-DQN) is applied to mobile robot to learn the best decision. On the basis of Soft-DQN, M-DQN adds the scaled log-policy to the immediate reward. The method allows agent to do more exploration. However, the M-DQN algorithm has the problem of slow convergence. A new and improved M-DQN algorithm (DM-DQN) is proposed in the paper to address the problem. First, its network structure was improved on the basis of M-DQN by decomposing the network structure into a value function and an advantage function, thus decoupling action selection and action evaluation and speeding up its convergence, giving it better generalization performance and enabling it to learn the best decision faster. Second, to address the problem of the robot’s trajectory being too close to the edge of the obstacle, a method of using an artificial potential field to set a reward function is proposed to drive the robot’s trajectory away from the vicinity of the obstacle. The result of simulation experiment shows that the method learns more efficiently and converges faster than DQN, Dueling DQN and M-DQN in both static and dynamic environments, and is able to plan collision-free paths away from obstacles.https://doi.org/10.1007/s40747-022-00948-7Deep reinforcement learningDM-DQNPath planningDueling network
spellingShingle	Yuwan Gu Zhitao Zhu Jidong Lv Lin Shi Zhenjie Hou Shoukun Xu DM-DQN: Dueling Munchausen deep Q network for robot path planning Complex & Intelligent Systems Deep reinforcement learning DM-DQN Path planning Dueling network
title	DM-DQN: Dueling Munchausen deep Q network for robot path planning
title_full	DM-DQN: Dueling Munchausen deep Q network for robot path planning
title_fullStr	DM-DQN: Dueling Munchausen deep Q network for robot path planning
title_full_unstemmed	DM-DQN: Dueling Munchausen deep Q network for robot path planning
title_short	DM-DQN: Dueling Munchausen deep Q network for robot path planning
title_sort	dm dqn dueling munchausen deep q network for robot path planning
topic	Deep reinforcement learning DM-DQN Path planning Dueling network
url	https://doi.org/10.1007/s40747-022-00948-7
work_keys_str_mv	AT yuwangu dmdqnduelingmunchausendeepqnetworkforrobotpathplanning AT zhitaozhu dmdqnduelingmunchausendeepqnetworkforrobotpathplanning AT jidonglv dmdqnduelingmunchausendeepqnetworkforrobotpathplanning AT linshi dmdqnduelingmunchausendeepqnetworkforrobotpathplanning AT zhenjiehou dmdqnduelingmunchausendeepqnetworkforrobotpathplanning AT shoukunxu dmdqnduelingmunchausendeepqnetworkforrobotpathplanning

DM-DQN: Dueling Munchausen deep Q network for robot path planning

Similar Items