Maneuver Decision-Making through Automatic Curriculum Reinforcement Learning without Handcrafted Reward Functions
Maneuver decision-making is essential for autonomous air combat. However, previous methods usually make decisions to aim at the target instead of hitting the target and use discrete action spaces instead of continuous action spaces. While these simplifications make maneuver decision-making easier, t...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-08-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/16/9421 |
_version_ | 1797585643607425024 |
---|---|
author | Yujie Wei Hongpeng Zhang Yuan Wang Changqiang Huang |
author_facet | Yujie Wei Hongpeng Zhang Yuan Wang Changqiang Huang |
author_sort | Yujie Wei |
collection | DOAJ |
description | Maneuver decision-making is essential for autonomous air combat. However, previous methods usually make decisions to aim at the target instead of hitting the target and use discrete action spaces instead of continuous action spaces. While these simplifications make maneuver decision-making easier, they also make maneuver decision-making more unrealistic. Meanwhile, previous studies usually rely on handcrafted reward functions, which are troublesome to design. Therefore, to solve these problems, we propose an automatic curriculum reinforcement learning method that enables agents to maneuver effectively in air combat from scratch. On the basis of curriculum reinforcement learning, maneuver decision-making is divided into a series of sub-tasks from easy to difficult. Thus, agents can gradually learn how to complete a series of sub-tasks, from easy to difficult without handcrafted reward functions. The ablation studies show that automatic curriculum learning is essential for reinforcement learning; namely, agents cannot make effective decisions without curriculum learning. Simulations show that, after training, agents are able to make effective decisions given different states, including tracking, attacking, and escaping, which are both rational and interpretable. |
first_indexed | 2024-03-11T00:09:03Z |
format | Article |
id | doaj.art-dd27c58259e44fc6926cde85795ad5e6 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-11T00:09:03Z |
publishDate | 2023-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-dd27c58259e44fc6926cde85795ad5e62023-11-19T00:09:30ZengMDPI AGApplied Sciences2076-34172023-08-011316942110.3390/app13169421Maneuver Decision-Making through Automatic Curriculum Reinforcement Learning without Handcrafted Reward FunctionsYujie Wei0Hongpeng Zhang1Yuan Wang2Changqiang Huang3Aeronautics Engineering College, Air Force Engineering University, Xi’an 710038, ChinaAeronautics Engineering College, Air Force Engineering University, Xi’an 710038, ChinaAeronautics Engineering College, Air Force Engineering University, Xi’an 710038, ChinaAeronautics Engineering College, Air Force Engineering University, Xi’an 710038, ChinaManeuver decision-making is essential for autonomous air combat. However, previous methods usually make decisions to aim at the target instead of hitting the target and use discrete action spaces instead of continuous action spaces. While these simplifications make maneuver decision-making easier, they also make maneuver decision-making more unrealistic. Meanwhile, previous studies usually rely on handcrafted reward functions, which are troublesome to design. Therefore, to solve these problems, we propose an automatic curriculum reinforcement learning method that enables agents to maneuver effectively in air combat from scratch. On the basis of curriculum reinforcement learning, maneuver decision-making is divided into a series of sub-tasks from easy to difficult. Thus, agents can gradually learn how to complete a series of sub-tasks, from easy to difficult without handcrafted reward functions. The ablation studies show that automatic curriculum learning is essential for reinforcement learning; namely, agents cannot make effective decisions without curriculum learning. Simulations show that, after training, agents are able to make effective decisions given different states, including tracking, attacking, and escaping, which are both rational and interpretable.https://www.mdpi.com/2076-3417/13/16/9421reinforcement learningcurriculum learningmaneuver decision-makingunmanned combat aerial vehiclesparse rewards |
spellingShingle | Yujie Wei Hongpeng Zhang Yuan Wang Changqiang Huang Maneuver Decision-Making through Automatic Curriculum Reinforcement Learning without Handcrafted Reward Functions Applied Sciences reinforcement learning curriculum learning maneuver decision-making unmanned combat aerial vehicle sparse rewards |
title | Maneuver Decision-Making through Automatic Curriculum Reinforcement Learning without Handcrafted Reward Functions |
title_full | Maneuver Decision-Making through Automatic Curriculum Reinforcement Learning without Handcrafted Reward Functions |
title_fullStr | Maneuver Decision-Making through Automatic Curriculum Reinforcement Learning without Handcrafted Reward Functions |
title_full_unstemmed | Maneuver Decision-Making through Automatic Curriculum Reinforcement Learning without Handcrafted Reward Functions |
title_short | Maneuver Decision-Making through Automatic Curriculum Reinforcement Learning without Handcrafted Reward Functions |
title_sort | maneuver decision making through automatic curriculum reinforcement learning without handcrafted reward functions |
topic | reinforcement learning curriculum learning maneuver decision-making unmanned combat aerial vehicle sparse rewards |
url | https://www.mdpi.com/2076-3417/13/16/9421 |
work_keys_str_mv | AT yujiewei maneuverdecisionmakingthroughautomaticcurriculumreinforcementlearningwithouthandcraftedrewardfunctions AT hongpengzhang maneuverdecisionmakingthroughautomaticcurriculumreinforcementlearningwithouthandcraftedrewardfunctions AT yuanwang maneuverdecisionmakingthroughautomaticcurriculumreinforcementlearningwithouthandcraftedrewardfunctions AT changqianghuang maneuverdecisionmakingthroughautomaticcurriculumreinforcementlearningwithouthandcraftedrewardfunctions |