Multi-UAV Autonomous Path Planning in Reconnaissance Missions Considering Incomplete Information: A Reinforcement Learning Method
Unmanned aerial vehicles (UAVs) are important in reconnaissance missions because of their flexibility and convenience. Vitally, UAVs are capable of autonomous navigation, which means they can be used to plan safe paths to target positions in dangerous surroundings. Traditional path-planning algorith...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-12-01
|
Series: | Drones |
Subjects: | |
Online Access: | https://www.mdpi.com/2504-446X/7/1/10 |
_version_ | 1827626701153107968 |
---|---|
author | Yu Chen Qi Dong Xiaozhou Shang Zhenyu Wu Jinyu Wang |
author_facet | Yu Chen Qi Dong Xiaozhou Shang Zhenyu Wu Jinyu Wang |
author_sort | Yu Chen |
collection | DOAJ |
description | Unmanned aerial vehicles (UAVs) are important in reconnaissance missions because of their flexibility and convenience. Vitally, UAVs are capable of autonomous navigation, which means they can be used to plan safe paths to target positions in dangerous surroundings. Traditional path-planning algorithms do not perform well when the environmental state is dynamic and partially observable. It is difficult for a UAV to make the correct decision with incomplete information. In this study, we proposed a multi-UAV path planning algorithm based on multi-agent reinforcement learning which entails the adoption of centralized training–decentralized execution architecture to coordinate all the UAVs. Additionally, we introduced a hidden state of the recurrent neural network to utilize the historical observation information. To solve the multi-objective optimization problem, We designed a joint reward function to guide UAVs to learn optimal policies under the multiple constraints. The results demonstrate that by using our method, we were able to solve the problem of incomplete information and low efficiency caused by partial observations and sparse rewards in reinforcement learning, and we realized kdiff multi-UAV cooperative autonomous path planning in unknown environment. |
first_indexed | 2024-03-09T13:01:17Z |
format | Article |
id | doaj.art-e8b74384f6e64b709ddf3ff1d94ff758 |
institution | Directory Open Access Journal |
issn | 2504-446X |
language | English |
last_indexed | 2024-03-09T13:01:17Z |
publishDate | 2022-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Drones |
spelling | doaj.art-e8b74384f6e64b709ddf3ff1d94ff7582023-11-30T21:55:02ZengMDPI AGDrones2504-446X2022-12-01711010.3390/drones7010010Multi-UAV Autonomous Path Planning in Reconnaissance Missions Considering Incomplete Information: A Reinforcement Learning MethodYu Chen0Qi Dong1Xiaozhou Shang2Zhenyu Wu3Jinyu Wang4Institute of Advanced Technology, University of Science and Technology of China, Hefei 230026, ChinaInstitute of Advanced Technology, University of Science and Technology of China, Hefei 230026, ChinaChina Academy of Electronics and Information Technology, Beijing 100049, ChinaSchool of Information and Electronics, Beijing Institute of Technology, Beijing 100081, ChinaSchool of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaUnmanned aerial vehicles (UAVs) are important in reconnaissance missions because of their flexibility and convenience. Vitally, UAVs are capable of autonomous navigation, which means they can be used to plan safe paths to target positions in dangerous surroundings. Traditional path-planning algorithms do not perform well when the environmental state is dynamic and partially observable. It is difficult for a UAV to make the correct decision with incomplete information. In this study, we proposed a multi-UAV path planning algorithm based on multi-agent reinforcement learning which entails the adoption of centralized training–decentralized execution architecture to coordinate all the UAVs. Additionally, we introduced a hidden state of the recurrent neural network to utilize the historical observation information. To solve the multi-objective optimization problem, We designed a joint reward function to guide UAVs to learn optimal policies under the multiple constraints. The results demonstrate that by using our method, we were able to solve the problem of incomplete information and low efficiency caused by partial observations and sparse rewards in reinforcement learning, and we realized kdiff multi-UAV cooperative autonomous path planning in unknown environment.https://www.mdpi.com/2504-446X/7/1/10multi-UAVpath planningincomplete informationmulti-objectivereinforcement learning |
spellingShingle | Yu Chen Qi Dong Xiaozhou Shang Zhenyu Wu Jinyu Wang Multi-UAV Autonomous Path Planning in Reconnaissance Missions Considering Incomplete Information: A Reinforcement Learning Method Drones multi-UAV path planning incomplete information multi-objective reinforcement learning |
title | Multi-UAV Autonomous Path Planning in Reconnaissance Missions Considering Incomplete Information: A Reinforcement Learning Method |
title_full | Multi-UAV Autonomous Path Planning in Reconnaissance Missions Considering Incomplete Information: A Reinforcement Learning Method |
title_fullStr | Multi-UAV Autonomous Path Planning in Reconnaissance Missions Considering Incomplete Information: A Reinforcement Learning Method |
title_full_unstemmed | Multi-UAV Autonomous Path Planning in Reconnaissance Missions Considering Incomplete Information: A Reinforcement Learning Method |
title_short | Multi-UAV Autonomous Path Planning in Reconnaissance Missions Considering Incomplete Information: A Reinforcement Learning Method |
title_sort | multi uav autonomous path planning in reconnaissance missions considering incomplete information a reinforcement learning method |
topic | multi-UAV path planning incomplete information multi-objective reinforcement learning |
url | https://www.mdpi.com/2504-446X/7/1/10 |
work_keys_str_mv | AT yuchen multiuavautonomouspathplanninginreconnaissancemissionsconsideringincompleteinformationareinforcementlearningmethod AT qidong multiuavautonomouspathplanninginreconnaissancemissionsconsideringincompleteinformationareinforcementlearningmethod AT xiaozhoushang multiuavautonomouspathplanninginreconnaissancemissionsconsideringincompleteinformationareinforcementlearningmethod AT zhenyuwu multiuavautonomouspathplanninginreconnaissancemissionsconsideringincompleteinformationareinforcementlearningmethod AT jinyuwang multiuavautonomouspathplanninginreconnaissancemissionsconsideringincompleteinformationareinforcementlearningmethod |