An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor

In this paper, a novel deep reinforcement learning algorithm based on Proximal Policy Optimization (PPO) is proposed to achieve the fixed point flight control of a quadrotor. The attitude and position information of the quadrotor is directly mapped to the PWM signals of the four rotors through neura...

Full description

Bibliographic Details
Main Authors: Wentao Xue, Hangxing Wu, Hui Ye, Shuyi Shao
Format: Article
Language:English
Published: MDPI AG 2022-04-01
Series:Actuators
Subjects:
Online Access:https://www.mdpi.com/2076-0825/11/4/105
_version_ 1797437428705787904
author Wentao Xue
Hangxing Wu
Hui Ye
Shuyi Shao
author_facet Wentao Xue
Hangxing Wu
Hui Ye
Shuyi Shao
author_sort Wentao Xue
collection DOAJ
description In this paper, a novel deep reinforcement learning algorithm based on Proximal Policy Optimization (PPO) is proposed to achieve the fixed point flight control of a quadrotor. The attitude and position information of the quadrotor is directly mapped to the PWM signals of the four rotors through neural network control. To constrain the size of policy updates, a PPO algorithm based on Monte Carlo approximations is proposed to achieve the optimal penalty coefficient. A policy optimization method with a penalized point probability distance can provide the diversity of policy by performing each policy update. The new proxy objective function is introduced into the actor–critic network, which solves the problem of PPO falling into local optimization. Moreover, a compound reward function is presented to accelerate the gradient algorithm along the policy update direction by analyzing various states that the quadrotor may encounter in the flight, which improves the learning efficiency of the network. The simulation tests the generalization ability of the offline policy by changing the wing length and payload of the quadrotor. Compared with the PPO method, the proposed method has higher learning efficiency and better robustness.
first_indexed 2024-03-09T11:20:11Z
format Article
id doaj.art-61d6ed487ebf4d74a7910fcac45fca1c
institution Directory Open Access Journal
issn 2076-0825
language English
last_indexed 2024-03-09T11:20:11Z
publishDate 2022-04-01
publisher MDPI AG
record_format Article
series Actuators
spelling doaj.art-61d6ed487ebf4d74a7910fcac45fca1c2023-12-01T00:21:20ZengMDPI AGActuators2076-08252022-04-0111410510.3390/act11040105An Improved Proximal Policy Optimization Method for Low-Level Control of a QuadrotorWentao Xue0Hangxing Wu1Hui Ye2Shuyi Shao3School of Electronic and Information, Jiangsu University of Science and Technology, Zhenjiang 212100, ChinaSchool of Electronic and Information, Jiangsu University of Science and Technology, Zhenjiang 212100, ChinaSchool of Electronic and Information, Jiangsu University of Science and Technology, Zhenjiang 212100, ChinaCollege of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, ChinaIn this paper, a novel deep reinforcement learning algorithm based on Proximal Policy Optimization (PPO) is proposed to achieve the fixed point flight control of a quadrotor. The attitude and position information of the quadrotor is directly mapped to the PWM signals of the four rotors through neural network control. To constrain the size of policy updates, a PPO algorithm based on Monte Carlo approximations is proposed to achieve the optimal penalty coefficient. A policy optimization method with a penalized point probability distance can provide the diversity of policy by performing each policy update. The new proxy objective function is introduced into the actor–critic network, which solves the problem of PPO falling into local optimization. Moreover, a compound reward function is presented to accelerate the gradient algorithm along the policy update direction by analyzing various states that the quadrotor may encounter in the flight, which improves the learning efficiency of the network. The simulation tests the generalization ability of the offline policy by changing the wing length and payload of the quadrotor. Compared with the PPO method, the proposed method has higher learning efficiency and better robustness.https://www.mdpi.com/2076-0825/11/4/105Proximal Policy Optimization (PPO)quadrotor controlreinforcement learning
spellingShingle Wentao Xue
Hangxing Wu
Hui Ye
Shuyi Shao
An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor
Actuators
Proximal Policy Optimization (PPO)
quadrotor control
reinforcement learning
title An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor
title_full An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor
title_fullStr An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor
title_full_unstemmed An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor
title_short An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor
title_sort improved proximal policy optimization method for low level control of a quadrotor
topic Proximal Policy Optimization (PPO)
quadrotor control
reinforcement learning
url https://www.mdpi.com/2076-0825/11/4/105
work_keys_str_mv AT wentaoxue animprovedproximalpolicyoptimizationmethodforlowlevelcontrolofaquadrotor
AT hangxingwu animprovedproximalpolicyoptimizationmethodforlowlevelcontrolofaquadrotor
AT huiye animprovedproximalpolicyoptimizationmethodforlowlevelcontrolofaquadrotor
AT shuyishao animprovedproximalpolicyoptimizationmethodforlowlevelcontrolofaquadrotor
AT wentaoxue improvedproximalpolicyoptimizationmethodforlowlevelcontrolofaquadrotor
AT hangxingwu improvedproximalpolicyoptimizationmethodforlowlevelcontrolofaquadrotor
AT huiye improvedproximalpolicyoptimizationmethodforlowlevelcontrolofaquadrotor
AT shuyishao improvedproximalpolicyoptimizationmethodforlowlevelcontrolofaquadrotor