Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance

Multiple unmanned aerial vehicle (UAV) collaboration has great potential. To increase the intelligence and environmental adaptability of multi-UAV control, we study the application of deep reinforcement learning algorithms in the field of multi-UAV cooperative control. Aiming at the problem of a non...

Full description

Bibliographic Details
Main Authors: Weiwei Zhao, Hairong Chu, Xikui Miao, Lihong Guo, Honghai Shen, Chenhao Zhu, Feng Zhang, Dongxin Liang
Format: Article
Language:English
Published: MDPI AG 2020-08-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/20/16/4546
_version_ 1797558294463643648
author Weiwei Zhao
Hairong Chu
Xikui Miao
Lihong Guo
Honghai Shen
Chenhao Zhu
Feng Zhang
Dongxin Liang
author_facet Weiwei Zhao
Hairong Chu
Xikui Miao
Lihong Guo
Honghai Shen
Chenhao Zhu
Feng Zhang
Dongxin Liang
author_sort Weiwei Zhao
collection DOAJ
description Multiple unmanned aerial vehicle (UAV) collaboration has great potential. To increase the intelligence and environmental adaptability of multi-UAV control, we study the application of deep reinforcement learning algorithms in the field of multi-UAV cooperative control. Aiming at the problem of a non-stationary environment caused by the change of learning agent strategy in reinforcement learning in a multi-agent environment, the paper presents an improved multiagent reinforcement learning algorithm—the multiagent joint proximal policy optimization (MAJPPO) algorithm with the centralized learning and decentralized execution. This algorithm uses the moving window averaging method to make each agent obtain a centralized state value function, so that the agents can achieve better collaboration. The improved algorithm enhances the collaboration and increases the sum of reward values obtained by the multiagent system. To evaluate the performance of the algorithm, we use the MAJPPO algorithm to complete the task of multi-UAV formation and the crossing of multiple-obstacle environments. To simplify the control complexity of the UAV, we use the six-degree of freedom and 12-state equations of the dynamics model of the UAV with an attitude control loop. The experimental results show that the MAJPPO algorithm has better performance and better environmental adaptability.
first_indexed 2024-03-10T17:28:14Z
format Article
id doaj.art-41062f1c83b744b8a3d6b12137fa3c1e
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-10T17:28:14Z
publishDate 2020-08-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-41062f1c83b744b8a3d6b12137fa3c1e2023-11-20T10:05:48ZengMDPI AGSensors1424-82202020-08-012016454610.3390/s20164546Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle AvoidanceWeiwei Zhao0Hairong Chu1Xikui Miao2Lihong Guo3Honghai Shen4Chenhao Zhu5Feng Zhang6Dongxin Liang7Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, No. 3888, Dongnanhu Rd., Changchun 130033, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, No. 3888, Dongnanhu Rd., Changchun 130033, ChinaSchool of Information Engineering, Henan University of Science and Technology, Luoyang 471000, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, No. 3888, Dongnanhu Rd., Changchun 130033, ChinaKey Laboratory of Airborne Optical Imaging and Measurement, Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, No. 3888, Dong Nanhu Road, Changchun 130033, ChinaChangchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, No. 3888, Dongnanhu Rd., Changchun 130033, ChinaSchool of Aviation Operations and Services, Aviation University of the Air Force, No. 2222, Dongnanhu Rd., Changchun 130022, ChinaXi’an Jiaotong University Health Science Center, No. 76, Yanta West Road, Xi’an 710061, ChinaMultiple unmanned aerial vehicle (UAV) collaboration has great potential. To increase the intelligence and environmental adaptability of multi-UAV control, we study the application of deep reinforcement learning algorithms in the field of multi-UAV cooperative control. Aiming at the problem of a non-stationary environment caused by the change of learning agent strategy in reinforcement learning in a multi-agent environment, the paper presents an improved multiagent reinforcement learning algorithm—the multiagent joint proximal policy optimization (MAJPPO) algorithm with the centralized learning and decentralized execution. This algorithm uses the moving window averaging method to make each agent obtain a centralized state value function, so that the agents can achieve better collaboration. The improved algorithm enhances the collaboration and increases the sum of reward values obtained by the multiagent system. To evaluate the performance of the algorithm, we use the MAJPPO algorithm to complete the task of multi-UAV formation and the crossing of multiple-obstacle environments. To simplify the control complexity of the UAV, we use the six-degree of freedom and 12-state equations of the dynamics model of the UAV with an attitude control loop. The experimental results show that the MAJPPO algorithm has better performance and better environmental adaptability.https://www.mdpi.com/1424-8220/20/16/4546reinforcement learningproximal policy optimization (PPO)the joint state-value functionmultiagent cooperativemultiple unmanned aerial vehicles (multi-UAV) formationobstacle avoidance
spellingShingle Weiwei Zhao
Hairong Chu
Xikui Miao
Lihong Guo
Honghai Shen
Chenhao Zhu
Feng Zhang
Dongxin Liang
Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance
Sensors
reinforcement learning
proximal policy optimization (PPO)
the joint state-value function
multiagent cooperative
multiple unmanned aerial vehicles (multi-UAV) formation
obstacle avoidance
title Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance
title_full Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance
title_fullStr Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance
title_full_unstemmed Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance
title_short Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance
title_sort research on the multiagent joint proximal policy optimization algorithm controlling cooperative fixed wing uav obstacle avoidance
topic reinforcement learning
proximal policy optimization (PPO)
the joint state-value function
multiagent cooperative
multiple unmanned aerial vehicles (multi-UAV) formation
obstacle avoidance
url https://www.mdpi.com/1424-8220/20/16/4546
work_keys_str_mv AT weiweizhao researchonthemultiagentjointproximalpolicyoptimizationalgorithmcontrollingcooperativefixedwinguavobstacleavoidance
AT hairongchu researchonthemultiagentjointproximalpolicyoptimizationalgorithmcontrollingcooperativefixedwinguavobstacleavoidance
AT xikuimiao researchonthemultiagentjointproximalpolicyoptimizationalgorithmcontrollingcooperativefixedwinguavobstacleavoidance
AT lihongguo researchonthemultiagentjointproximalpolicyoptimizationalgorithmcontrollingcooperativefixedwinguavobstacleavoidance
AT honghaishen researchonthemultiagentjointproximalpolicyoptimizationalgorithmcontrollingcooperativefixedwinguavobstacleavoidance
AT chenhaozhu researchonthemultiagentjointproximalpolicyoptimizationalgorithmcontrollingcooperativefixedwinguavobstacleavoidance
AT fengzhang researchonthemultiagentjointproximalpolicyoptimizationalgorithmcontrollingcooperativefixedwinguavobstacleavoidance
AT dongxinliang researchonthemultiagentjointproximalpolicyoptimizationalgorithmcontrollingcooperativefixedwinguavobstacleavoidance