Multi-UAV Autonomous Path Planning in Reconnaissance Missions Considering Incomplete Information: A Reinforcement Learning Method

Unmanned aerial vehicles (UAVs) are important in reconnaissance missions because of their flexibility and convenience. Vitally, UAVs are capable of autonomous navigation, which means they can be used to plan safe paths to target positions in dangerous surroundings. Traditional path-planning algorith...

Full description

Bibliographic Details
Main Authors: Yu Chen, Qi Dong, Xiaozhou Shang, Zhenyu Wu, Jinyu Wang
Format: Article
Language:English
Published: MDPI AG 2022-12-01
Series:Drones
Subjects:
Online Access:https://www.mdpi.com/2504-446X/7/1/10
_version_ 1827626701153107968
author Yu Chen
Qi Dong
Xiaozhou Shang
Zhenyu Wu
Jinyu Wang
author_facet Yu Chen
Qi Dong
Xiaozhou Shang
Zhenyu Wu
Jinyu Wang
author_sort Yu Chen
collection DOAJ
description Unmanned aerial vehicles (UAVs) are important in reconnaissance missions because of their flexibility and convenience. Vitally, UAVs are capable of autonomous navigation, which means they can be used to plan safe paths to target positions in dangerous surroundings. Traditional path-planning algorithms do not perform well when the environmental state is dynamic and partially observable. It is difficult for a UAV to make the correct decision with incomplete information. In this study, we proposed a multi-UAV path planning algorithm based on multi-agent reinforcement learning which entails the adoption of centralized training–decentralized execution architecture to coordinate all the UAVs. Additionally, we introduced a hidden state of the recurrent neural network to utilize the historical observation information. To solve the multi-objective optimization problem, We designed a joint reward function to guide UAVs to learn optimal policies under the multiple constraints. The results demonstrate that by using our method, we were able to solve the problem of incomplete information and low efficiency caused by partial observations and sparse rewards in reinforcement learning, and we realized kdiff multi-UAV cooperative autonomous path planning in unknown environment.
first_indexed 2024-03-09T13:01:17Z
format Article
id doaj.art-e8b74384f6e64b709ddf3ff1d94ff758
institution Directory Open Access Journal
issn 2504-446X
language English
last_indexed 2024-03-09T13:01:17Z
publishDate 2022-12-01
publisher MDPI AG
record_format Article
series Drones
spelling doaj.art-e8b74384f6e64b709ddf3ff1d94ff7582023-11-30T21:55:02ZengMDPI AGDrones2504-446X2022-12-01711010.3390/drones7010010Multi-UAV Autonomous Path Planning in Reconnaissance Missions Considering Incomplete Information: A Reinforcement Learning MethodYu Chen0Qi Dong1Xiaozhou Shang2Zhenyu Wu3Jinyu Wang4Institute of Advanced Technology, University of Science and Technology of China, Hefei 230026, ChinaInstitute of Advanced Technology, University of Science and Technology of China, Hefei 230026, ChinaChina Academy of Electronics and Information Technology, Beijing 100049, ChinaSchool of Information and Electronics, Beijing Institute of Technology, Beijing 100081, ChinaSchool of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaUnmanned aerial vehicles (UAVs) are important in reconnaissance missions because of their flexibility and convenience. Vitally, UAVs are capable of autonomous navigation, which means they can be used to plan safe paths to target positions in dangerous surroundings. Traditional path-planning algorithms do not perform well when the environmental state is dynamic and partially observable. It is difficult for a UAV to make the correct decision with incomplete information. In this study, we proposed a multi-UAV path planning algorithm based on multi-agent reinforcement learning which entails the adoption of centralized training–decentralized execution architecture to coordinate all the UAVs. Additionally, we introduced a hidden state of the recurrent neural network to utilize the historical observation information. To solve the multi-objective optimization problem, We designed a joint reward function to guide UAVs to learn optimal policies under the multiple constraints. The results demonstrate that by using our method, we were able to solve the problem of incomplete information and low efficiency caused by partial observations and sparse rewards in reinforcement learning, and we realized kdiff multi-UAV cooperative autonomous path planning in unknown environment.https://www.mdpi.com/2504-446X/7/1/10multi-UAVpath planningincomplete informationmulti-objectivereinforcement learning
spellingShingle Yu Chen
Qi Dong
Xiaozhou Shang
Zhenyu Wu
Jinyu Wang
Multi-UAV Autonomous Path Planning in Reconnaissance Missions Considering Incomplete Information: A Reinforcement Learning Method
Drones
multi-UAV
path planning
incomplete information
multi-objective
reinforcement learning
title Multi-UAV Autonomous Path Planning in Reconnaissance Missions Considering Incomplete Information: A Reinforcement Learning Method
title_full Multi-UAV Autonomous Path Planning in Reconnaissance Missions Considering Incomplete Information: A Reinforcement Learning Method
title_fullStr Multi-UAV Autonomous Path Planning in Reconnaissance Missions Considering Incomplete Information: A Reinforcement Learning Method
title_full_unstemmed Multi-UAV Autonomous Path Planning in Reconnaissance Missions Considering Incomplete Information: A Reinforcement Learning Method
title_short Multi-UAV Autonomous Path Planning in Reconnaissance Missions Considering Incomplete Information: A Reinforcement Learning Method
title_sort multi uav autonomous path planning in reconnaissance missions considering incomplete information a reinforcement learning method
topic multi-UAV
path planning
incomplete information
multi-objective
reinforcement learning
url https://www.mdpi.com/2504-446X/7/1/10
work_keys_str_mv AT yuchen multiuavautonomouspathplanninginreconnaissancemissionsconsideringincompleteinformationareinforcementlearningmethod
AT qidong multiuavautonomouspathplanninginreconnaissancemissionsconsideringincompleteinformationareinforcementlearningmethod
AT xiaozhoushang multiuavautonomouspathplanninginreconnaissancemissionsconsideringincompleteinformationareinforcementlearningmethod
AT zhenyuwu multiuavautonomouspathplanninginreconnaissancemissionsconsideringincompleteinformationareinforcementlearningmethod
AT jinyuwang multiuavautonomouspathplanninginreconnaissancemissionsconsideringincompleteinformationareinforcementlearningmethod