An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor

In this paper, a novel deep reinforcement learning algorithm based on Proximal Policy Optimization (PPO) is proposed to achieve the fixed point flight control of a quadrotor. The attitude and position information of the quadrotor is directly mapped to the PWM signals of the four rotors through neura...

Full description

Bibliographic Details
Main Authors:	Wentao Xue, Hangxing Wu, Hui Ye, Shuyi Shao
Format:	Article
Language:	English
Published:	MDPI AG 2022-04-01
Series:	Actuators
Subjects:	Proximal Policy Optimization (PPO) quadrotor control reinforcement learning
Online Access:	https://www.mdpi.com/2076-0825/11/4/105

_version_	1797437428705787904
author	Wentao Xue Hangxing Wu Hui Ye Shuyi Shao
author_facet	Wentao Xue Hangxing Wu Hui Ye Shuyi Shao
author_sort	Wentao Xue
collection	DOAJ
description	In this paper, a novel deep reinforcement learning algorithm based on Proximal Policy Optimization (PPO) is proposed to achieve the fixed point flight control of a quadrotor. The attitude and position information of the quadrotor is directly mapped to the PWM signals of the four rotors through neural network control. To constrain the size of policy updates, a PPO algorithm based on Monte Carlo approximations is proposed to achieve the optimal penalty coefficient. A policy optimization method with a penalized point probability distance can provide the diversity of policy by performing each policy update. The new proxy objective function is introduced into the actor–critic network, which solves the problem of PPO falling into local optimization. Moreover, a compound reward function is presented to accelerate the gradient algorithm along the policy update direction by analyzing various states that the quadrotor may encounter in the flight, which improves the learning efficiency of the network. The simulation tests the generalization ability of the offline policy by changing the wing length and payload of the quadrotor. Compared with the PPO method, the proposed method has higher learning efficiency and better robustness.
first_indexed	2024-03-09T11:20:11Z
format	Article
id	doaj.art-61d6ed487ebf4d74a7910fcac45fca1c
institution	Directory Open Access Journal
issn	2076-0825
language	English
last_indexed	2024-03-09T11:20:11Z
publishDate	2022-04-01
publisher	MDPI AG
record_format	Article
series	Actuators
spelling	doaj.art-61d6ed487ebf4d74a7910fcac45fca1c2023-12-01T00:21:20ZengMDPI AGActuators2076-08252022-04-0111410510.3390/act11040105An Improved Proximal Policy Optimization Method for Low-Level Control of a QuadrotorWentao Xue0Hangxing Wu1Hui Ye2Shuyi Shao3School of Electronic and Information, Jiangsu University of Science and Technology, Zhenjiang 212100, ChinaSchool of Electronic and Information, Jiangsu University of Science and Technology, Zhenjiang 212100, ChinaSchool of Electronic and Information, Jiangsu University of Science and Technology, Zhenjiang 212100, ChinaCollege of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, ChinaIn this paper, a novel deep reinforcement learning algorithm based on Proximal Policy Optimization (PPO) is proposed to achieve the fixed point flight control of a quadrotor. The attitude and position information of the quadrotor is directly mapped to the PWM signals of the four rotors through neural network control. To constrain the size of policy updates, a PPO algorithm based on Monte Carlo approximations is proposed to achieve the optimal penalty coefficient. A policy optimization method with a penalized point probability distance can provide the diversity of policy by performing each policy update. The new proxy objective function is introduced into the actor–critic network, which solves the problem of PPO falling into local optimization. Moreover, a compound reward function is presented to accelerate the gradient algorithm along the policy update direction by analyzing various states that the quadrotor may encounter in the flight, which improves the learning efficiency of the network. The simulation tests the generalization ability of the offline policy by changing the wing length and payload of the quadrotor. Compared with the PPO method, the proposed method has higher learning efficiency and better robustness.https://www.mdpi.com/2076-0825/11/4/105Proximal Policy Optimization (PPO)quadrotor controlreinforcement learning
spellingShingle	Wentao Xue Hangxing Wu Hui Ye Shuyi Shao An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor Actuators Proximal Policy Optimization (PPO) quadrotor control reinforcement learning
title	An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor
title_full	An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor
title_fullStr	An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor
title_full_unstemmed	An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor
title_short	An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor
title_sort	improved proximal policy optimization method for low level control of a quadrotor
topic	Proximal Policy Optimization (PPO) quadrotor control reinforcement learning
url	https://www.mdpi.com/2076-0825/11/4/105
work_keys_str_mv	AT wentaoxue animprovedproximalpolicyoptimizationmethodforlowlevelcontrolofaquadrotor AT hangxingwu animprovedproximalpolicyoptimizationmethodforlowlevelcontrolofaquadrotor AT huiye animprovedproximalpolicyoptimizationmethodforlowlevelcontrolofaquadrotor AT shuyishao animprovedproximalpolicyoptimizationmethodforlowlevelcontrolofaquadrotor AT wentaoxue improvedproximalpolicyoptimizationmethodforlowlevelcontrolofaquadrotor AT hangxingwu improvedproximalpolicyoptimizationmethodforlowlevelcontrolofaquadrotor AT huiye improvedproximalpolicyoptimizationmethodforlowlevelcontrolofaquadrotor AT shuyishao improvedproximalpolicyoptimizationmethodforlowlevelcontrolofaquadrotor

An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor

Similar Items