Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learning
Abstract Aiming at addressing the problem of manoeuvring decision‐making in UAV air combat, this study establishes a one‐to‐one air combat model, defines missile attack areas, and uses the non‐deterministic policy Soft‐Actor‐Critic (SAC) algorithm in deep reinforcement learning to construct a decisi...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2023-03-01
|
Series: | CAAI Transactions on Intelligence Technology |
Subjects: | |
Online Access: | https://doi.org/10.1049/cit2.12109 |
_version_ | 1827990630149652480 |
---|---|
author | Bo Li Jingyi Huang Shuangxia Bai Zhigang Gan Shiyang Liang Neretin Evgeny Shouwen Yao |
author_facet | Bo Li Jingyi Huang Shuangxia Bai Zhigang Gan Shiyang Liang Neretin Evgeny Shouwen Yao |
author_sort | Bo Li |
collection | DOAJ |
description | Abstract Aiming at addressing the problem of manoeuvring decision‐making in UAV air combat, this study establishes a one‐to‐one air combat model, defines missile attack areas, and uses the non‐deterministic policy Soft‐Actor‐Critic (SAC) algorithm in deep reinforcement learning to construct a decision model to realize the manoeuvring process. At the same time, the complexity of the proposed algorithm is calculated, and the stability of the closed‐loop system of air combat decision‐making controlled by neural network is analysed by the Lyapunov function. This study defines the UAV air combat process as a gaming process and proposes a Parallel Self‐Play training SAC algorithm (PSP‐SAC) to improve the generalisation performance of UAV control decisions. Simulation results have shown that the proposed algorithm can realize sample sharing and policy sharing in multiple combat environments and can significantly improve the generalisation ability of the model compared to independent training. |
first_indexed | 2024-04-10T00:38:23Z |
format | Article |
id | doaj.art-580f320876a94b2a88b82abb478f9014 |
institution | Directory Open Access Journal |
issn | 2468-2322 |
language | English |
last_indexed | 2024-04-10T00:38:23Z |
publishDate | 2023-03-01 |
publisher | Wiley |
record_format | Article |
series | CAAI Transactions on Intelligence Technology |
spelling | doaj.art-580f320876a94b2a88b82abb478f90142023-03-14T08:04:43ZengWileyCAAI Transactions on Intelligence Technology2468-23222023-03-0181648110.1049/cit2.12109Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learningBo Li0Jingyi Huang1Shuangxia Bai2Zhigang Gan3Shiyang Liang4Neretin Evgeny5Shouwen Yao6School of Electronics Information Northwestern Polytechnical University Xi'an ChinaSchool of Electronics Information Northwestern Polytechnical University Xi'an ChinaSchool of Electronics Information Northwestern Polytechnical University Xi'an ChinaSchool of Electronics Information Northwestern Polytechnical University Xi'an ChinaAvic Luoyang Electro‐optical Equipment Research Institute Luoyang ChinaMoscow Aviation Institute 4 Volokolamskoe Highway Moscow RussiaSchool of Mechanical Engineering Beijing Institute of Technology Beijing ChinaAbstract Aiming at addressing the problem of manoeuvring decision‐making in UAV air combat, this study establishes a one‐to‐one air combat model, defines missile attack areas, and uses the non‐deterministic policy Soft‐Actor‐Critic (SAC) algorithm in deep reinforcement learning to construct a decision model to realize the manoeuvring process. At the same time, the complexity of the proposed algorithm is calculated, and the stability of the closed‐loop system of air combat decision‐making controlled by neural network is analysed by the Lyapunov function. This study defines the UAV air combat process as a gaming process and proposes a Parallel Self‐Play training SAC algorithm (PSP‐SAC) to improve the generalisation performance of UAV control decisions. Simulation results have shown that the proposed algorithm can realize sample sharing and policy sharing in multiple combat environments and can significantly improve the generalisation ability of the model compared to independent training.https://doi.org/10.1049/cit2.12109air combat decisiondeep reinforcement learningparallel self‐playSAC algorithmUAV |
spellingShingle | Bo Li Jingyi Huang Shuangxia Bai Zhigang Gan Shiyang Liang Neretin Evgeny Shouwen Yao Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learning CAAI Transactions on Intelligence Technology air combat decision deep reinforcement learning parallel self‐play SAC algorithm UAV |
title | Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learning |
title_full | Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learning |
title_fullStr | Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learning |
title_full_unstemmed | Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learning |
title_short | Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learning |
title_sort | autonomous air combat decision making of uav based on parallel self play reinforcement learning |
topic | air combat decision deep reinforcement learning parallel self‐play SAC algorithm UAV |
url | https://doi.org/10.1049/cit2.12109 |
work_keys_str_mv | AT boli autonomousaircombatdecisionmakingofuavbasedonparallelselfplayreinforcementlearning AT jingyihuang autonomousaircombatdecisionmakingofuavbasedonparallelselfplayreinforcementlearning AT shuangxiabai autonomousaircombatdecisionmakingofuavbasedonparallelselfplayreinforcementlearning AT zhiganggan autonomousaircombatdecisionmakingofuavbasedonparallelselfplayreinforcementlearning AT shiyangliang autonomousaircombatdecisionmakingofuavbasedonparallelselfplayreinforcementlearning AT neretinevgeny autonomousaircombatdecisionmakingofuavbasedonparallelselfplayreinforcementlearning AT shouwenyao autonomousaircombatdecisionmakingofuavbasedonparallelselfplayreinforcementlearning |