Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum

With the mature development of artificial intelligence technology, the application of intelligent control algorithms in control systems has become a trend to meet the high-performance requirements of modern society. This paper proposes a deep deterministic policy gradient (DDPG) controller design me...

Full description

Bibliographic Details
Main Authors: Hailin Hu, Yuhui Chen, Tao Wang, Fu Feng, Weijin Chen
Format: Article
Language:English
Published: MDPI AG 2023-06-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/13/7594
_version_ 1797592121942736896
author Hailin Hu
Yuhui Chen
Tao Wang
Fu Feng
Weijin Chen
author_facet Hailin Hu
Yuhui Chen
Tao Wang
Fu Feng
Weijin Chen
author_sort Hailin Hu
collection DOAJ
description With the mature development of artificial intelligence technology, the application of intelligent control algorithms in control systems has become a trend to meet the high-performance requirements of modern society. This paper proposes a deep deterministic policy gradient (DDPG) controller design method based on deep reinforcement learning to improve system control performance. Firstly, the optimal control policy of the DDPG algorithm is derived from the Markov decision process and the Actor–Critic algorithm. Secondly, in order to avoid local optima in traditional control systems, the capacity and the settlement method of the DDPG experience pool are adjusted to absorb positive experience to accelerate convergence and to complete efficient training. In response, and to solve the overestimation of the Q value in DDPG, the overall structure of the Critic network is changed to shorten the convergence period of DDPG at low learning rates. Finally, a first-order inverted pendulum control system was constructed in a simulation environment to verify the control effectiveness of PID, DDPG, and improved DDPG. The simulation results reveal that the improved DDPG controller has a faster response to disturbances, smaller displacement, and angular displacement of the first-order inverted pendulum. The simulation further proves that the improved DDPG algorithm has better stability and convergence and stronger anti-interference ability and stability recovery. This control method provides a certain reference for the application of reinforcement learning in traditional control systems.
first_indexed 2024-03-11T01:46:58Z
format Article
id doaj.art-4249938e99f449ea96d0c644b061f17a
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T01:46:58Z
publishDate 2023-06-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-4249938e99f449ea96d0c644b061f17a2023-11-18T16:08:39ZengMDPI AGApplied Sciences2076-34172023-06-011313759410.3390/app13137594Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted PendulumHailin Hu0Yuhui Chen1Tao Wang2Fu Feng3Weijin Chen4School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, ChinaSchool of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, ChinaSchool of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, ChinaSchool of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, ChinaSchool of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, ChinaWith the mature development of artificial intelligence technology, the application of intelligent control algorithms in control systems has become a trend to meet the high-performance requirements of modern society. This paper proposes a deep deterministic policy gradient (DDPG) controller design method based on deep reinforcement learning to improve system control performance. Firstly, the optimal control policy of the DDPG algorithm is derived from the Markov decision process and the Actor–Critic algorithm. Secondly, in order to avoid local optima in traditional control systems, the capacity and the settlement method of the DDPG experience pool are adjusted to absorb positive experience to accelerate convergence and to complete efficient training. In response, and to solve the overestimation of the Q value in DDPG, the overall structure of the Critic network is changed to shorten the convergence period of DDPG at low learning rates. Finally, a first-order inverted pendulum control system was constructed in a simulation environment to verify the control effectiveness of PID, DDPG, and improved DDPG. The simulation results reveal that the improved DDPG controller has a faster response to disturbances, smaller displacement, and angular displacement of the first-order inverted pendulum. The simulation further proves that the improved DDPG algorithm has better stability and convergence and stronger anti-interference ability and stability recovery. This control method provides a certain reference for the application of reinforcement learning in traditional control systems.https://www.mdpi.com/2076-3417/13/13/7594deep deterministic policy algorithmoptimal control policylocal optimumoverestimation of Q valuetraditional control systems
spellingShingle Hailin Hu
Yuhui Chen
Tao Wang
Fu Feng
Weijin Chen
Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum
Applied Sciences
deep deterministic policy algorithm
optimal control policy
local optimum
overestimation of Q value
traditional control systems
title Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum
title_full Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum
title_fullStr Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum
title_full_unstemmed Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum
title_short Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum
title_sort research on the deep deterministic policy algorithm based on the first order inverted pendulum
topic deep deterministic policy algorithm
optimal control policy
local optimum
overestimation of Q value
traditional control systems
url https://www.mdpi.com/2076-3417/13/13/7594
work_keys_str_mv AT hailinhu researchonthedeepdeterministicpolicyalgorithmbasedonthefirstorderinvertedpendulum
AT yuhuichen researchonthedeepdeterministicpolicyalgorithmbasedonthefirstorderinvertedpendulum
AT taowang researchonthedeepdeterministicpolicyalgorithmbasedonthefirstorderinvertedpendulum
AT fufeng researchonthedeepdeterministicpolicyalgorithmbasedonthefirstorderinvertedpendulum
AT weijinchen researchonthedeepdeterministicpolicyalgorithmbasedonthefirstorderinvertedpendulum