Maneuver Decision-Making through Automatic Curriculum Reinforcement Learning without Handcrafted Reward Functions

Maneuver decision-making is essential for autonomous air combat. However, previous methods usually make decisions to aim at the target instead of hitting the target and use discrete action spaces instead of continuous action spaces. While these simplifications make maneuver decision-making easier, t...

Full description

Bibliographic Details
Main Authors: Yujie Wei, Hongpeng Zhang, Yuan Wang, Changqiang Huang
Format: Article
Language:English
Published: MDPI AG 2023-08-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/16/9421
_version_ 1797585643607425024
author Yujie Wei
Hongpeng Zhang
Yuan Wang
Changqiang Huang
author_facet Yujie Wei
Hongpeng Zhang
Yuan Wang
Changqiang Huang
author_sort Yujie Wei
collection DOAJ
description Maneuver decision-making is essential for autonomous air combat. However, previous methods usually make decisions to aim at the target instead of hitting the target and use discrete action spaces instead of continuous action spaces. While these simplifications make maneuver decision-making easier, they also make maneuver decision-making more unrealistic. Meanwhile, previous studies usually rely on handcrafted reward functions, which are troublesome to design. Therefore, to solve these problems, we propose an automatic curriculum reinforcement learning method that enables agents to maneuver effectively in air combat from scratch. On the basis of curriculum reinforcement learning, maneuver decision-making is divided into a series of sub-tasks from easy to difficult. Thus, agents can gradually learn how to complete a series of sub-tasks, from easy to difficult without handcrafted reward functions. The ablation studies show that automatic curriculum learning is essential for reinforcement learning; namely, agents cannot make effective decisions without curriculum learning. Simulations show that, after training, agents are able to make effective decisions given different states, including tracking, attacking, and escaping, which are both rational and interpretable.
first_indexed 2024-03-11T00:09:03Z
format Article
id doaj.art-dd27c58259e44fc6926cde85795ad5e6
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T00:09:03Z
publishDate 2023-08-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-dd27c58259e44fc6926cde85795ad5e62023-11-19T00:09:30ZengMDPI AGApplied Sciences2076-34172023-08-011316942110.3390/app13169421Maneuver Decision-Making through Automatic Curriculum Reinforcement Learning without Handcrafted Reward FunctionsYujie Wei0Hongpeng Zhang1Yuan Wang2Changqiang Huang3Aeronautics Engineering College, Air Force Engineering University, Xi’an 710038, ChinaAeronautics Engineering College, Air Force Engineering University, Xi’an 710038, ChinaAeronautics Engineering College, Air Force Engineering University, Xi’an 710038, ChinaAeronautics Engineering College, Air Force Engineering University, Xi’an 710038, ChinaManeuver decision-making is essential for autonomous air combat. However, previous methods usually make decisions to aim at the target instead of hitting the target and use discrete action spaces instead of continuous action spaces. While these simplifications make maneuver decision-making easier, they also make maneuver decision-making more unrealistic. Meanwhile, previous studies usually rely on handcrafted reward functions, which are troublesome to design. Therefore, to solve these problems, we propose an automatic curriculum reinforcement learning method that enables agents to maneuver effectively in air combat from scratch. On the basis of curriculum reinforcement learning, maneuver decision-making is divided into a series of sub-tasks from easy to difficult. Thus, agents can gradually learn how to complete a series of sub-tasks, from easy to difficult without handcrafted reward functions. The ablation studies show that automatic curriculum learning is essential for reinforcement learning; namely, agents cannot make effective decisions without curriculum learning. Simulations show that, after training, agents are able to make effective decisions given different states, including tracking, attacking, and escaping, which are both rational and interpretable.https://www.mdpi.com/2076-3417/13/16/9421reinforcement learningcurriculum learningmaneuver decision-makingunmanned combat aerial vehiclesparse rewards
spellingShingle Yujie Wei
Hongpeng Zhang
Yuan Wang
Changqiang Huang
Maneuver Decision-Making through Automatic Curriculum Reinforcement Learning without Handcrafted Reward Functions
Applied Sciences
reinforcement learning
curriculum learning
maneuver decision-making
unmanned combat aerial vehicle
sparse rewards
title Maneuver Decision-Making through Automatic Curriculum Reinforcement Learning without Handcrafted Reward Functions
title_full Maneuver Decision-Making through Automatic Curriculum Reinforcement Learning without Handcrafted Reward Functions
title_fullStr Maneuver Decision-Making through Automatic Curriculum Reinforcement Learning without Handcrafted Reward Functions
title_full_unstemmed Maneuver Decision-Making through Automatic Curriculum Reinforcement Learning without Handcrafted Reward Functions
title_short Maneuver Decision-Making through Automatic Curriculum Reinforcement Learning without Handcrafted Reward Functions
title_sort maneuver decision making through automatic curriculum reinforcement learning without handcrafted reward functions
topic reinforcement learning
curriculum learning
maneuver decision-making
unmanned combat aerial vehicle
sparse rewards
url https://www.mdpi.com/2076-3417/13/16/9421
work_keys_str_mv AT yujiewei maneuverdecisionmakingthroughautomaticcurriculumreinforcementlearningwithouthandcraftedrewardfunctions
AT hongpengzhang maneuverdecisionmakingthroughautomaticcurriculumreinforcementlearningwithouthandcraftedrewardfunctions
AT yuanwang maneuverdecisionmakingthroughautomaticcurriculumreinforcementlearningwithouthandcraftedrewardfunctions
AT changqianghuang maneuverdecisionmakingthroughautomaticcurriculumreinforcementlearningwithouthandcraftedrewardfunctions