Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution

The demand for autonomous UAV swarm operations has been on the rise following the success of UAVs in various challenging tasks. Yet conventional swarm control approaches are inadequate for coping with swarm scalability, computational requirements, and real-time performance. In this paper, we demonst...

Full description

Bibliographic Details
Main Authors: Rana Azzam, Igor Boiko, Yahya Zweiri
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Drones
Subjects:
Online Access:https://www.mdpi.com/2504-446X/7/3/193
_version_ 1797612317321461760
author Rana Azzam
Igor Boiko
Yahya Zweiri
author_facet Rana Azzam
Igor Boiko
Yahya Zweiri
author_sort Rana Azzam
collection DOAJ
description The demand for autonomous UAV swarm operations has been on the rise following the success of UAVs in various challenging tasks. Yet conventional swarm control approaches are inadequate for coping with swarm scalability, computational requirements, and real-time performance. In this paper, we demonstrate the capability of emerging multi-agent reinforcement learning (MARL) approaches to successfully and efficiently make sequential decisions during UAV swarm collaborative tasks. We propose a scalable, real-time, MARL approach for UAV collaborative navigation where members of the swarm have to arrive at target locations at the same time. Centralized training and decentralized execution (CTDE) are used to achieve this, where a combination of negative and positive reinforcement is employed in the reward function. Curriculum learning is used to facilitate the sought performance, especially due to the high complexity of the problem which requires extensive exploration. A UAV model that highly resembles the respective physical platform is used for training the proposed framework to make training and testing realistic. The scalability of the platform to various swarm sizes, speeds, goal positions, environment dimensions, and UAV masses has been showcased in (1) a load drop-off scenario, and (2) UAV swarm formation without requiring any re-training or fine-tuning of the agents. The obtained simulation results have proven the effectiveness and generalizability of our proposed MARL framework for cooperative UAV navigation.
first_indexed 2024-03-11T06:39:33Z
format Article
id doaj.art-1e3946cb4ecf4ea2bcea381e9ea3697e
institution Directory Open Access Journal
issn 2504-446X
language English
last_indexed 2024-03-11T06:39:33Z
publishDate 2023-03-01
publisher MDPI AG
record_format Article
series Drones
spelling doaj.art-1e3946cb4ecf4ea2bcea381e9ea3697e2023-11-17T10:39:43ZengMDPI AGDrones2504-446X2023-03-017319310.3390/drones7030193Swarm Cooperative Navigation Using Centralized Training and Decentralized ExecutionRana Azzam0Igor Boiko1Yahya Zweiri2Aerospace Engineering Department, Khalifa University of Science and Technology, Abu Dhabi P.O. Box 127788, United Arab EmiratesElectrical Engineering and Computer Science Department, Khalifa University of Science and Technology, Abu Dhabi P.O. Box 127788, United Arab EmiratesAerospace Engineering Department, Khalifa University of Science and Technology, Abu Dhabi P.O. Box 127788, United Arab EmiratesThe demand for autonomous UAV swarm operations has been on the rise following the success of UAVs in various challenging tasks. Yet conventional swarm control approaches are inadequate for coping with swarm scalability, computational requirements, and real-time performance. In this paper, we demonstrate the capability of emerging multi-agent reinforcement learning (MARL) approaches to successfully and efficiently make sequential decisions during UAV swarm collaborative tasks. We propose a scalable, real-time, MARL approach for UAV collaborative navigation where members of the swarm have to arrive at target locations at the same time. Centralized training and decentralized execution (CTDE) are used to achieve this, where a combination of negative and positive reinforcement is employed in the reward function. Curriculum learning is used to facilitate the sought performance, especially due to the high complexity of the problem which requires extensive exploration. A UAV model that highly resembles the respective physical platform is used for training the proposed framework to make training and testing realistic. The scalability of the platform to various swarm sizes, speeds, goal positions, environment dimensions, and UAV masses has been showcased in (1) a load drop-off scenario, and (2) UAV swarm formation without requiring any re-training or fine-tuning of the agents. The obtained simulation results have proven the effectiveness and generalizability of our proposed MARL framework for cooperative UAV navigation.https://www.mdpi.com/2504-446X/7/3/193UAV cooperative navigationmulti-agent reinforcement learningautonomous decision makingcentralized training and decentralized executioncurriculum learning
spellingShingle Rana Azzam
Igor Boiko
Yahya Zweiri
Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution
Drones
UAV cooperative navigation
multi-agent reinforcement learning
autonomous decision making
centralized training and decentralized execution
curriculum learning
title Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution
title_full Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution
title_fullStr Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution
title_full_unstemmed Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution
title_short Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution
title_sort swarm cooperative navigation using centralized training and decentralized execution
topic UAV cooperative navigation
multi-agent reinforcement learning
autonomous decision making
centralized training and decentralized execution
curriculum learning
url https://www.mdpi.com/2504-446X/7/3/193
work_keys_str_mv AT ranaazzam swarmcooperativenavigationusingcentralizedtraininganddecentralizedexecution
AT igorboiko swarmcooperativenavigationusingcentralizedtraininganddecentralizedexecution
AT yahyazweiri swarmcooperativenavigationusingcentralizedtraininganddecentralizedexecution