PPO-Exp: Keeping Fixed-Wing UAV Formation with Deep Reinforcement Learning

Flocking for fixed-Wing Unmanned Aerial Vehicles (UAVs) is an extremely complex challenge due to fixed-wing UAV’s control problem and the system’s coordinate difficulty. Recently, flocking approaches based on reinforcement learning have attracted attention. However, current methods also require that...

Full description

Bibliographic Details
Main Authors:	Dan Xu, Yunxiao Guo, Zhongyi Yu, Zhenfeng Wang, Rongze Lan, Runhao Zhao, Xinjia Xie, Han Long
Format:	Article
Language:	English
Published:	MDPI AG 2022-12-01
Series:	Drones
Subjects:	fixed-wing UAV formation keeping reinforcement learning
Online Access:	https://www.mdpi.com/2504-446X/7/1/28

_version_	1827626690764865536
author	Dan Xu Yunxiao Guo Zhongyi Yu Zhenfeng Wang Rongze Lan Runhao Zhao Xinjia Xie Han Long
author_facet	Dan Xu Yunxiao Guo Zhongyi Yu Zhenfeng Wang Rongze Lan Runhao Zhao Xinjia Xie Han Long
author_sort	Dan Xu
collection	DOAJ
description	Flocking for fixed-Wing Unmanned Aerial Vehicles (UAVs) is an extremely complex challenge due to fixed-wing UAV’s control problem and the system’s coordinate difficulty. Recently, flocking approaches based on reinforcement learning have attracted attention. However, current methods also require that each UAV makes the decision decentralized, which increases the cost and computation of the whole UAV system. This paper researches a low-cost UAV formation system consisting of one leader (equipped with the intelligence chip) with five followers (without the intelligence chip), and proposes a centralized collision-free formation-keeping method. The communication in the whole process is considered and the protocol is designed by minimizing the communication cost. In addition, an analysis of the Proximal Policy Optimization (PPO) algorithm is provided; the paper derives the estimation error bound, and reveals the relationship between the bound and exploration. To encourage the agent to balance their exploration and estimation error bound, a version of PPO named PPO-Exploration (PPO-Exp) is proposed. It can adjust the clip constraint parameter and make the exploration mechanism more flexible. The results of the experiments show that PPO-Exp performs better than the current algorithms in these tasks.
first_indexed	2024-03-09T13:00:09Z
format	Article
id	doaj.art-c6ab4f1ac76d493ba7ce24f9710da27c
institution	Directory Open Access Journal
issn	2504-446X
language	English
last_indexed	2024-03-09T13:00:09Z
publishDate	2022-12-01
publisher	MDPI AG
record_format	Article
series	Drones
spelling	doaj.art-c6ab4f1ac76d493ba7ce24f9710da27c2023-11-30T21:55:17ZengMDPI AGDrones2504-446X2022-12-01712810.3390/drones7010028PPO-Exp: Keeping Fixed-Wing UAV Formation with Deep Reinforcement LearningDan Xu0Yunxiao Guo1Zhongyi Yu2Zhenfeng Wang3Rongze Lan4Runhao Zhao5Xinjia Xie6Han Long7College of System Engineering, National University of Defense Technology, Changsha 410073, ChinaCollege of Sciences, National University of Defense Technology, Changsha 410073, ChinaCollege of Advanced Interdisciplinary Studies, National University of Defense Technology, Changsha 410073, ChinaCollege of Sciences, National University of Defense Technology, Changsha 410073, ChinaCollege of Sciences, National University of Defense Technology, Changsha 410073, ChinaCollege of System Engineering, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science, National University of Defense Technology, Changsha 410073, ChinaCollege of Sciences, National University of Defense Technology, Changsha 410073, ChinaFlocking for fixed-Wing Unmanned Aerial Vehicles (UAVs) is an extremely complex challenge due to fixed-wing UAV’s control problem and the system’s coordinate difficulty. Recently, flocking approaches based on reinforcement learning have attracted attention. However, current methods also require that each UAV makes the decision decentralized, which increases the cost and computation of the whole UAV system. This paper researches a low-cost UAV formation system consisting of one leader (equipped with the intelligence chip) with five followers (without the intelligence chip), and proposes a centralized collision-free formation-keeping method. The communication in the whole process is considered and the protocol is designed by minimizing the communication cost. In addition, an analysis of the Proximal Policy Optimization (PPO) algorithm is provided; the paper derives the estimation error bound, and reveals the relationship between the bound and exploration. To encourage the agent to balance their exploration and estimation error bound, a version of PPO named PPO-Exploration (PPO-Exp) is proposed. It can adjust the clip constraint parameter and make the exploration mechanism more flexible. The results of the experiments show that PPO-Exp performs better than the current algorithms in these tasks.https://www.mdpi.com/2504-446X/7/1/28fixed-wing UAVformation keepingreinforcement learning
spellingShingle	Dan Xu Yunxiao Guo Zhongyi Yu Zhenfeng Wang Rongze Lan Runhao Zhao Xinjia Xie Han Long PPO-Exp: Keeping Fixed-Wing UAV Formation with Deep Reinforcement Learning Drones fixed-wing UAV formation keeping reinforcement learning
title	PPO-Exp: Keeping Fixed-Wing UAV Formation with Deep Reinforcement Learning
title_full	PPO-Exp: Keeping Fixed-Wing UAV Formation with Deep Reinforcement Learning
title_fullStr	PPO-Exp: Keeping Fixed-Wing UAV Formation with Deep Reinforcement Learning
title_full_unstemmed	PPO-Exp: Keeping Fixed-Wing UAV Formation with Deep Reinforcement Learning
title_short	PPO-Exp: Keeping Fixed-Wing UAV Formation with Deep Reinforcement Learning
title_sort	ppo exp keeping fixed wing uav formation with deep reinforcement learning
topic	fixed-wing UAV formation keeping reinforcement learning
url	https://www.mdpi.com/2504-446X/7/1/28
work_keys_str_mv	AT danxu ppoexpkeepingfixedwinguavformationwithdeepreinforcementlearning AT yunxiaoguo ppoexpkeepingfixedwinguavformationwithdeepreinforcementlearning AT zhongyiyu ppoexpkeepingfixedwinguavformationwithdeepreinforcementlearning AT zhenfengwang ppoexpkeepingfixedwinguavformationwithdeepreinforcementlearning AT rongzelan ppoexpkeepingfixedwinguavformationwithdeepreinforcementlearning AT runhaozhao ppoexpkeepingfixedwinguavformationwithdeepreinforcementlearning AT xinjiaxie ppoexpkeepingfixedwinguavformationwithdeepreinforcementlearning AT hanlong ppoexpkeepingfixedwinguavformationwithdeepreinforcementlearning

PPO-Exp: Keeping Fixed-Wing UAV Formation with Deep Reinforcement Learning

Similar Items