Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution

The demand for autonomous UAV swarm operations has been on the rise following the success of UAVs in various challenging tasks. Yet conventional swarm control approaches are inadequate for coping with swarm scalability, computational requirements, and real-time performance. In this paper, we demonst...

Full description

Bibliographic Details
Main Authors:	Rana Azzam, Igor Boiko, Yahya Zweiri
Format:	Article
Language:	English
Published:	MDPI AG 2023-03-01
Series:	Drones
Subjects:	UAV cooperative navigation multi-agent reinforcement learning autonomous decision making centralized training and decentralized execution curriculum learning
Online Access:	https://www.mdpi.com/2504-446X/7/3/193

_version_	1797612317321461760
author	Rana Azzam Igor Boiko Yahya Zweiri
author_facet	Rana Azzam Igor Boiko Yahya Zweiri
author_sort	Rana Azzam
collection	DOAJ
description	The demand for autonomous UAV swarm operations has been on the rise following the success of UAVs in various challenging tasks. Yet conventional swarm control approaches are inadequate for coping with swarm scalability, computational requirements, and real-time performance. In this paper, we demonstrate the capability of emerging multi-agent reinforcement learning (MARL) approaches to successfully and efficiently make sequential decisions during UAV swarm collaborative tasks. We propose a scalable, real-time, MARL approach for UAV collaborative navigation where members of the swarm have to arrive at target locations at the same time. Centralized training and decentralized execution (CTDE) are used to achieve this, where a combination of negative and positive reinforcement is employed in the reward function. Curriculum learning is used to facilitate the sought performance, especially due to the high complexity of the problem which requires extensive exploration. A UAV model that highly resembles the respective physical platform is used for training the proposed framework to make training and testing realistic. The scalability of the platform to various swarm sizes, speeds, goal positions, environment dimensions, and UAV masses has been showcased in (1) a load drop-off scenario, and (2) UAV swarm formation without requiring any re-training or fine-tuning of the agents. The obtained simulation results have proven the effectiveness and generalizability of our proposed MARL framework for cooperative UAV navigation.
first_indexed	2024-03-11T06:39:33Z
format	Article
id	doaj.art-1e3946cb4ecf4ea2bcea381e9ea3697e
institution	Directory Open Access Journal
issn	2504-446X
language	English
last_indexed	2024-03-11T06:39:33Z
publishDate	2023-03-01
publisher	MDPI AG
record_format	Article
series	Drones
spelling	doaj.art-1e3946cb4ecf4ea2bcea381e9ea3697e2023-11-17T10:39:43ZengMDPI AGDrones2504-446X2023-03-017319310.3390/drones7030193Swarm Cooperative Navigation Using Centralized Training and Decentralized ExecutionRana Azzam0Igor Boiko1Yahya Zweiri2Aerospace Engineering Department, Khalifa University of Science and Technology, Abu Dhabi P.O. Box 127788, United Arab EmiratesElectrical Engineering and Computer Science Department, Khalifa University of Science and Technology, Abu Dhabi P.O. Box 127788, United Arab EmiratesAerospace Engineering Department, Khalifa University of Science and Technology, Abu Dhabi P.O. Box 127788, United Arab EmiratesThe demand for autonomous UAV swarm operations has been on the rise following the success of UAVs in various challenging tasks. Yet conventional swarm control approaches are inadequate for coping with swarm scalability, computational requirements, and real-time performance. In this paper, we demonstrate the capability of emerging multi-agent reinforcement learning (MARL) approaches to successfully and efficiently make sequential decisions during UAV swarm collaborative tasks. We propose a scalable, real-time, MARL approach for UAV collaborative navigation where members of the swarm have to arrive at target locations at the same time. Centralized training and decentralized execution (CTDE) are used to achieve this, where a combination of negative and positive reinforcement is employed in the reward function. Curriculum learning is used to facilitate the sought performance, especially due to the high complexity of the problem which requires extensive exploration. A UAV model that highly resembles the respective physical platform is used for training the proposed framework to make training and testing realistic. The scalability of the platform to various swarm sizes, speeds, goal positions, environment dimensions, and UAV masses has been showcased in (1) a load drop-off scenario, and (2) UAV swarm formation without requiring any re-training or fine-tuning of the agents. The obtained simulation results have proven the effectiveness and generalizability of our proposed MARL framework for cooperative UAV navigation.https://www.mdpi.com/2504-446X/7/3/193UAV cooperative navigationmulti-agent reinforcement learningautonomous decision makingcentralized training and decentralized executioncurriculum learning
spellingShingle	Rana Azzam Igor Boiko Yahya Zweiri Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution Drones UAV cooperative navigation multi-agent reinforcement learning autonomous decision making centralized training and decentralized execution curriculum learning
title	Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution
title_full	Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution
title_fullStr	Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution
title_full_unstemmed	Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution
title_short	Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution
title_sort	swarm cooperative navigation using centralized training and decentralized execution
topic	UAV cooperative navigation multi-agent reinforcement learning autonomous decision making centralized training and decentralized execution curriculum learning
url	https://www.mdpi.com/2504-446X/7/3/193
work_keys_str_mv	AT ranaazzam swarmcooperativenavigationusingcentralizedtraininganddecentralizedexecution AT igorboiko swarmcooperativenavigationusingcentralizedtraininganddecentralizedexecution AT yahyazweiri swarmcooperativenavigationusingcentralizedtraininganddecentralizedexecution

Swarm Cooperative Navigation Using Centralized Training and Decentralized Execution

Similar Items