Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning

Guidance commands of flight vehicles can be regarded as a series of data sets having fixed time intervals, thus guidance design constitutes a typical sequential decision problem and satisfies the basic conditions for using the deep reinforcement learning (DRL) technique. In this paper, we consider t...

Full description

Bibliographic Details
Main Authors:	Xiao Hu, Tianshu Wang, Min Gong, Shaoshi Yang
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Deep reinforcement learning evolution strategy (ES) guidance design max-min problem proximal policy optimization (PPO)
Online Access:	https://ieeexplore.ieee.org/document/10485410/

_version_	1797217377989951488
author	Xiao Hu Tianshu Wang Min Gong Shaoshi Yang
author_facet	Xiao Hu Tianshu Wang Min Gong Shaoshi Yang
author_sort	Xiao Hu
collection	DOAJ
description	Guidance commands of flight vehicles can be regarded as a series of data sets having fixed time intervals, thus guidance design constitutes a typical sequential decision problem and satisfies the basic conditions for using the deep reinforcement learning (DRL) technique. In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on the DRL technique and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. Evasion distance is described as the minimum distance between the EFV and the PFV during the escape-and-pursuit process. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, which is described as the EFV’s velocity when the evasion distance occurs, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated. In this problem, the time instant when the optimal solution (i.e., the maximum residual velocity satisfying the evasion distance constraint) can be attained is uncertain and the optimum solution is dependent on all the intermediate guidance commands generated before. For solving this challenging problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Extensive simulation results demonstrate that the proposed guidance design method based on the PPO algorithm is capable of achieving a residual velocity of 67.24 m/s, higher than the residual velocities achieved by the benchmark soft actor-critic and deep deterministic policy gradient algorithms. Furthermore, the proposed ES-enhanced PPO algorithm outperforms the PPO algorithm by 2.7%, achieving a residual velocity of 69.04 m/s.
first_indexed	2024-04-24T12:00:54Z
format	Article
id	doaj.art-747423c0bf2a427f95cef6a10c059f71
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-24T12:00:54Z
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-747423c0bf2a427f95cef6a10c059f712024-04-08T23:01:21ZengIEEEIEEE Access2169-35362024-01-0112482104822210.1109/ACCESS.2024.338332210485410Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement LearningXiao Hu0Tianshu Wang1Min Gong2https://orcid.org/0009-0007-3011-7858Shaoshi Yang3https://orcid.org/0000-0003-2395-1637School of Aerospace Engineering, Tsinghua University, Beijing, ChinaSchool of Aerospace Engineering, Tsinghua University, Beijing, ChinaChina Academy of Launch Vehicle Technology, Beijing, ChinaSchool of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, ChinaGuidance commands of flight vehicles can be regarded as a series of data sets having fixed time intervals, thus guidance design constitutes a typical sequential decision problem and satisfies the basic conditions for using the deep reinforcement learning (DRL) technique. In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on the DRL technique and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. Evasion distance is described as the minimum distance between the EFV and the PFV during the escape-and-pursuit process. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, which is described as the EFV’s velocity when the evasion distance occurs, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated. In this problem, the time instant when the optimal solution (i.e., the maximum residual velocity satisfying the evasion distance constraint) can be attained is uncertain and the optimum solution is dependent on all the intermediate guidance commands generated before. For solving this challenging problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Extensive simulation results demonstrate that the proposed guidance design method based on the PPO algorithm is capable of achieving a residual velocity of 67.24 m/s, higher than the residual velocities achieved by the benchmark soft actor-critic and deep deterministic policy gradient algorithms. Furthermore, the proposed ES-enhanced PPO algorithm outperforms the PPO algorithm by 2.7%, achieving a residual velocity of 69.04 m/s.https://ieeexplore.ieee.org/document/10485410/Deep reinforcement learningevolution strategy (ES)guidance designmax-min problemproximal policy optimization (PPO)
spellingShingle	Xiao Hu Tianshu Wang Min Gong Shaoshi Yang Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning IEEE Access Deep reinforcement learning evolution strategy (ES) guidance design max-min problem proximal policy optimization (PPO)
title	Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning
title_full	Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning
title_fullStr	Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning
title_full_unstemmed	Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning
title_short	Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning
title_sort	guidance design for escape flight vehicle using evolution strategy enhanced deep reinforcement learning
topic	Deep reinforcement learning evolution strategy (ES) guidance design max-min problem proximal policy optimization (PPO)
url	https://ieeexplore.ieee.org/document/10485410/
work_keys_str_mv	AT xiaohu guidancedesignforescapeflightvehicleusingevolutionstrategyenhanceddeepreinforcementlearning AT tianshuwang guidancedesignforescapeflightvehicleusingevolutionstrategyenhanceddeepreinforcementlearning AT mingong guidancedesignforescapeflightvehicleusingevolutionstrategyenhanceddeepreinforcementlearning AT shaoshiyang guidancedesignforescapeflightvehicleusingevolutionstrategyenhanceddeepreinforcementlearning

Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning

Similar Items