Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution
The demand for autonomous UAV swarm operations has been on the rise following the success of UAVs in various challenging tasks. Yet conventional swarm control approaches are inadequate for coping with swarm scalability, computational requirements, and real-time performance. In this paper, we demonst...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-03-01
|
Series: | Drones |
Subjects: | |
Online Access: | https://www.mdpi.com/2504-446X/7/3/193 |
_version_ | 1797612317321461760 |
---|---|
author | Rana Azzam Igor Boiko Yahya Zweiri |
author_facet | Rana Azzam Igor Boiko Yahya Zweiri |
author_sort | Rana Azzam |
collection | DOAJ |
description | The demand for autonomous UAV swarm operations has been on the rise following the success of UAVs in various challenging tasks. Yet conventional swarm control approaches are inadequate for coping with swarm scalability, computational requirements, and real-time performance. In this paper, we demonstrate the capability of emerging multi-agent reinforcement learning (MARL) approaches to successfully and efficiently make sequential decisions during UAV swarm collaborative tasks. We propose a scalable, real-time, MARL approach for UAV collaborative navigation where members of the swarm have to arrive at target locations at the same time. Centralized training and decentralized execution (CTDE) are used to achieve this, where a combination of negative and positive reinforcement is employed in the reward function. Curriculum learning is used to facilitate the sought performance, especially due to the high complexity of the problem which requires extensive exploration. A UAV model that highly resembles the respective physical platform is used for training the proposed framework to make training and testing realistic. The scalability of the platform to various swarm sizes, speeds, goal positions, environment dimensions, and UAV masses has been showcased in (1) a load drop-off scenario, and (2) UAV swarm formation without requiring any re-training or fine-tuning of the agents. The obtained simulation results have proven the effectiveness and generalizability of our proposed MARL framework for cooperative UAV navigation. |
first_indexed | 2024-03-11T06:39:33Z |
format | Article |
id | doaj.art-1e3946cb4ecf4ea2bcea381e9ea3697e |
institution | Directory Open Access Journal |
issn | 2504-446X |
language | English |
last_indexed | 2024-03-11T06:39:33Z |
publishDate | 2023-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Drones |
spelling | doaj.art-1e3946cb4ecf4ea2bcea381e9ea3697e2023-11-17T10:39:43ZengMDPI AGDrones2504-446X2023-03-017319310.3390/drones7030193Swarm Cooperative Navigation Using Centralized Training and Decentralized ExecutionRana Azzam0Igor Boiko1Yahya Zweiri2Aerospace Engineering Department, Khalifa University of Science and Technology, Abu Dhabi P.O. Box 127788, United Arab EmiratesElectrical Engineering and Computer Science Department, Khalifa University of Science and Technology, Abu Dhabi P.O. Box 127788, United Arab EmiratesAerospace Engineering Department, Khalifa University of Science and Technology, Abu Dhabi P.O. Box 127788, United Arab EmiratesThe demand for autonomous UAV swarm operations has been on the rise following the success of UAVs in various challenging tasks. Yet conventional swarm control approaches are inadequate for coping with swarm scalability, computational requirements, and real-time performance. In this paper, we demonstrate the capability of emerging multi-agent reinforcement learning (MARL) approaches to successfully and efficiently make sequential decisions during UAV swarm collaborative tasks. We propose a scalable, real-time, MARL approach for UAV collaborative navigation where members of the swarm have to arrive at target locations at the same time. Centralized training and decentralized execution (CTDE) are used to achieve this, where a combination of negative and positive reinforcement is employed in the reward function. Curriculum learning is used to facilitate the sought performance, especially due to the high complexity of the problem which requires extensive exploration. A UAV model that highly resembles the respective physical platform is used for training the proposed framework to make training and testing realistic. The scalability of the platform to various swarm sizes, speeds, goal positions, environment dimensions, and UAV masses has been showcased in (1) a load drop-off scenario, and (2) UAV swarm formation without requiring any re-training or fine-tuning of the agents. The obtained simulation results have proven the effectiveness and generalizability of our proposed MARL framework for cooperative UAV navigation.https://www.mdpi.com/2504-446X/7/3/193UAV cooperative navigationmulti-agent reinforcement learningautonomous decision makingcentralized training and decentralized executioncurriculum learning |
spellingShingle | Rana Azzam Igor Boiko Yahya Zweiri Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution Drones UAV cooperative navigation multi-agent reinforcement learning autonomous decision making centralized training and decentralized execution curriculum learning |
title | Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution |
title_full | Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution |
title_fullStr | Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution |
title_full_unstemmed | Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution |
title_short | Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution |
title_sort | swarm cooperative navigation using centralized training and decentralized execution |
topic | UAV cooperative navigation multi-agent reinforcement learning autonomous decision making centralized training and decentralized execution curriculum learning |
url | https://www.mdpi.com/2504-446X/7/3/193 |
work_keys_str_mv | AT ranaazzam swarmcooperativenavigationusingcentralizedtraininganddecentralizedexecution AT igorboiko swarmcooperativenavigationusingcentralizedtraininganddecentralizedexecution AT yahyazweiri swarmcooperativenavigationusingcentralizedtraininganddecentralizedexecution |