Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learning

Abstract Aiming at addressing the problem of manoeuvring decision‐making in UAV air combat, this study establishes a one‐to‐one air combat model, defines missile attack areas, and uses the non‐deterministic policy Soft‐Actor‐Critic (SAC) algorithm in deep reinforcement learning to construct a decisi...

Full description

Bibliographic Details
Main Authors: Bo Li, Jingyi Huang, Shuangxia Bai, Zhigang Gan, Shiyang Liang, Neretin Evgeny, Shouwen Yao
Format: Article
Language:English
Published: Wiley 2023-03-01
Series:CAAI Transactions on Intelligence Technology
Subjects:
Online Access:https://doi.org/10.1049/cit2.12109
_version_ 1827990630149652480
author Bo Li
Jingyi Huang
Shuangxia Bai
Zhigang Gan
Shiyang Liang
Neretin Evgeny
Shouwen Yao
author_facet Bo Li
Jingyi Huang
Shuangxia Bai
Zhigang Gan
Shiyang Liang
Neretin Evgeny
Shouwen Yao
author_sort Bo Li
collection DOAJ
description Abstract Aiming at addressing the problem of manoeuvring decision‐making in UAV air combat, this study establishes a one‐to‐one air combat model, defines missile attack areas, and uses the non‐deterministic policy Soft‐Actor‐Critic (SAC) algorithm in deep reinforcement learning to construct a decision model to realize the manoeuvring process. At the same time, the complexity of the proposed algorithm is calculated, and the stability of the closed‐loop system of air combat decision‐making controlled by neural network is analysed by the Lyapunov function. This study defines the UAV air combat process as a gaming process and proposes a Parallel Self‐Play training SAC algorithm (PSP‐SAC) to improve the generalisation performance of UAV control decisions. Simulation results have shown that the proposed algorithm can realize sample sharing and policy sharing in multiple combat environments and can significantly improve the generalisation ability of the model compared to independent training.
first_indexed 2024-04-10T00:38:23Z
format Article
id doaj.art-580f320876a94b2a88b82abb478f9014
institution Directory Open Access Journal
issn 2468-2322
language English
last_indexed 2024-04-10T00:38:23Z
publishDate 2023-03-01
publisher Wiley
record_format Article
series CAAI Transactions on Intelligence Technology
spelling doaj.art-580f320876a94b2a88b82abb478f90142023-03-14T08:04:43ZengWileyCAAI Transactions on Intelligence Technology2468-23222023-03-0181648110.1049/cit2.12109Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learningBo Li0Jingyi Huang1Shuangxia Bai2Zhigang Gan3Shiyang Liang4Neretin Evgeny5Shouwen Yao6School of Electronics Information Northwestern Polytechnical University Xi'an ChinaSchool of Electronics Information Northwestern Polytechnical University Xi'an ChinaSchool of Electronics Information Northwestern Polytechnical University Xi'an ChinaSchool of Electronics Information Northwestern Polytechnical University Xi'an ChinaAvic Luoyang Electro‐optical Equipment Research Institute Luoyang ChinaMoscow Aviation Institute 4 Volokolamskoe Highway Moscow RussiaSchool of Mechanical Engineering Beijing Institute of Technology Beijing ChinaAbstract Aiming at addressing the problem of manoeuvring decision‐making in UAV air combat, this study establishes a one‐to‐one air combat model, defines missile attack areas, and uses the non‐deterministic policy Soft‐Actor‐Critic (SAC) algorithm in deep reinforcement learning to construct a decision model to realize the manoeuvring process. At the same time, the complexity of the proposed algorithm is calculated, and the stability of the closed‐loop system of air combat decision‐making controlled by neural network is analysed by the Lyapunov function. This study defines the UAV air combat process as a gaming process and proposes a Parallel Self‐Play training SAC algorithm (PSP‐SAC) to improve the generalisation performance of UAV control decisions. Simulation results have shown that the proposed algorithm can realize sample sharing and policy sharing in multiple combat environments and can significantly improve the generalisation ability of the model compared to independent training.https://doi.org/10.1049/cit2.12109air combat decisiondeep reinforcement learningparallel self‐playSAC algorithmUAV
spellingShingle Bo Li
Jingyi Huang
Shuangxia Bai
Zhigang Gan
Shiyang Liang
Neretin Evgeny
Shouwen Yao
Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learning
CAAI Transactions on Intelligence Technology
air combat decision
deep reinforcement learning
parallel self‐play
SAC algorithm
UAV
title Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learning
title_full Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learning
title_fullStr Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learning
title_full_unstemmed Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learning
title_short Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learning
title_sort autonomous air combat decision making of uav based on parallel self play reinforcement learning
topic air combat decision
deep reinforcement learning
parallel self‐play
SAC algorithm
UAV
url https://doi.org/10.1049/cit2.12109
work_keys_str_mv AT boli autonomousaircombatdecisionmakingofuavbasedonparallelselfplayreinforcementlearning
AT jingyihuang autonomousaircombatdecisionmakingofuavbasedonparallelselfplayreinforcementlearning
AT shuangxiabai autonomousaircombatdecisionmakingofuavbasedonparallelselfplayreinforcementlearning
AT zhiganggan autonomousaircombatdecisionmakingofuavbasedonparallelselfplayreinforcementlearning
AT shiyangliang autonomousaircombatdecisionmakingofuavbasedonparallelselfplayreinforcementlearning
AT neretinevgeny autonomousaircombatdecisionmakingofuavbasedonparallelselfplayreinforcementlearning
AT shouwenyao autonomousaircombatdecisionmakingofuavbasedonparallelselfplayreinforcementlearning