DM-DQN: Dueling Munchausen deep Q network for robot path planning

Abstract In order to achieve collision-free path planning in complex environment, Munchausen deep Q-learning network (M-DQN) is applied to mobile robot to learn the best decision. On the basis of Soft-DQN, M-DQN adds the scaled log-policy to the immediate reward. The method allows agent to do more e...

Full description

Bibliographic Details
Main Authors: Yuwan Gu, Zhitao Zhu, Jidong Lv, Lin Shi, Zhenjie Hou, Shoukun Xu
Format: Article
Language:English
Published: Springer 2022-12-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-022-00948-7
_version_ 1827890164873035776
author Yuwan Gu
Zhitao Zhu
Jidong Lv
Lin Shi
Zhenjie Hou
Shoukun Xu
author_facet Yuwan Gu
Zhitao Zhu
Jidong Lv
Lin Shi
Zhenjie Hou
Shoukun Xu
author_sort Yuwan Gu
collection DOAJ
description Abstract In order to achieve collision-free path planning in complex environment, Munchausen deep Q-learning network (M-DQN) is applied to mobile robot to learn the best decision. On the basis of Soft-DQN, M-DQN adds the scaled log-policy to the immediate reward. The method allows agent to do more exploration. However, the M-DQN algorithm has the problem of slow convergence. A new and improved M-DQN algorithm (DM-DQN) is proposed in the paper to address the problem. First, its network structure was improved on the basis of M-DQN by decomposing the network structure into a value function and an advantage function, thus decoupling action selection and action evaluation and speeding up its convergence, giving it better generalization performance and enabling it to learn the best decision faster. Second, to address the problem of the robot’s trajectory being too close to the edge of the obstacle, a method of using an artificial potential field to set a reward function is proposed to drive the robot’s trajectory away from the vicinity of the obstacle. The result of simulation experiment shows that the method learns more efficiently and converges faster than DQN, Dueling DQN and M-DQN in both static and dynamic environments, and is able to plan collision-free paths away from obstacles.
first_indexed 2024-03-12T21:05:48Z
format Article
id doaj.art-32168163282b494a9a606fb8cb9fd303
institution Directory Open Access Journal
issn 2199-4536
2198-6053
language English
last_indexed 2024-03-12T21:05:48Z
publishDate 2022-12-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj.art-32168163282b494a9a606fb8cb9fd3032023-07-30T11:27:59ZengSpringerComplex & Intelligent Systems2199-45362198-60532022-12-01944287430010.1007/s40747-022-00948-7DM-DQN: Dueling Munchausen deep Q network for robot path planningYuwan Gu0Zhitao Zhu1Jidong Lv2Lin Shi3Zhenjie Hou4Shoukun Xu5School of Computer Science and Artificial Intelligence, Changzhou UniversitySchool of Computer Science and Artificial Intelligence, Changzhou UniversitySchool of Computer Science and Artificial Intelligence, Changzhou UniversitySchool of Computer Science and Artificial Intelligence, Changzhou UniversitySchool of Computer Science and Artificial Intelligence, Changzhou UniversitySchool of Computer Science and Artificial Intelligence, Changzhou UniversityAbstract In order to achieve collision-free path planning in complex environment, Munchausen deep Q-learning network (M-DQN) is applied to mobile robot to learn the best decision. On the basis of Soft-DQN, M-DQN adds the scaled log-policy to the immediate reward. The method allows agent to do more exploration. However, the M-DQN algorithm has the problem of slow convergence. A new and improved M-DQN algorithm (DM-DQN) is proposed in the paper to address the problem. First, its network structure was improved on the basis of M-DQN by decomposing the network structure into a value function and an advantage function, thus decoupling action selection and action evaluation and speeding up its convergence, giving it better generalization performance and enabling it to learn the best decision faster. Second, to address the problem of the robot’s trajectory being too close to the edge of the obstacle, a method of using an artificial potential field to set a reward function is proposed to drive the robot’s trajectory away from the vicinity of the obstacle. The result of simulation experiment shows that the method learns more efficiently and converges faster than DQN, Dueling DQN and M-DQN in both static and dynamic environments, and is able to plan collision-free paths away from obstacles.https://doi.org/10.1007/s40747-022-00948-7Deep reinforcement learningDM-DQNPath planningDueling network
spellingShingle Yuwan Gu
Zhitao Zhu
Jidong Lv
Lin Shi
Zhenjie Hou
Shoukun Xu
DM-DQN: Dueling Munchausen deep Q network for robot path planning
Complex & Intelligent Systems
Deep reinforcement learning
DM-DQN
Path planning
Dueling network
title DM-DQN: Dueling Munchausen deep Q network for robot path planning
title_full DM-DQN: Dueling Munchausen deep Q network for robot path planning
title_fullStr DM-DQN: Dueling Munchausen deep Q network for robot path planning
title_full_unstemmed DM-DQN: Dueling Munchausen deep Q network for robot path planning
title_short DM-DQN: Dueling Munchausen deep Q network for robot path planning
title_sort dm dqn dueling munchausen deep q network for robot path planning
topic Deep reinforcement learning
DM-DQN
Path planning
Dueling network
url https://doi.org/10.1007/s40747-022-00948-7
work_keys_str_mv AT yuwangu dmdqnduelingmunchausendeepqnetworkforrobotpathplanning
AT zhitaozhu dmdqnduelingmunchausendeepqnetworkforrobotpathplanning
AT jidonglv dmdqnduelingmunchausendeepqnetworkforrobotpathplanning
AT linshi dmdqnduelingmunchausendeepqnetworkforrobotpathplanning
AT zhenjiehou dmdqnduelingmunchausendeepqnetworkforrobotpathplanning
AT shoukunxu dmdqnduelingmunchausendeepqnetworkforrobotpathplanning