Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum

With the mature development of artificial intelligence technology, the application of intelligent control algorithms in control systems has become a trend to meet the high-performance requirements of modern society. This paper proposes a deep deterministic policy gradient (DDPG) controller design me...

Full description

Bibliographic Details
Main Authors:	Hailin Hu, Yuhui Chen, Tao Wang, Fu Feng, Weijin Chen
Format:	Article
Language:	English
Published:	MDPI AG 2023-06-01
Series:	Applied Sciences
Subjects:	deep deterministic policy algorithm optimal control policy local optimum overestimation of Q value traditional control systems
Online Access:	https://www.mdpi.com/2076-3417/13/13/7594

_version_	1797592121942736896
author	Hailin Hu Yuhui Chen Tao Wang Fu Feng Weijin Chen
author_facet	Hailin Hu Yuhui Chen Tao Wang Fu Feng Weijin Chen
author_sort	Hailin Hu
collection	DOAJ
description	With the mature development of artificial intelligence technology, the application of intelligent control algorithms in control systems has become a trend to meet the high-performance requirements of modern society. This paper proposes a deep deterministic policy gradient (DDPG) controller design method based on deep reinforcement learning to improve system control performance. Firstly, the optimal control policy of the DDPG algorithm is derived from the Markov decision process and the Actor–Critic algorithm. Secondly, in order to avoid local optima in traditional control systems, the capacity and the settlement method of the DDPG experience pool are adjusted to absorb positive experience to accelerate convergence and to complete efficient training. In response, and to solve the overestimation of the Q value in DDPG, the overall structure of the Critic network is changed to shorten the convergence period of DDPG at low learning rates. Finally, a first-order inverted pendulum control system was constructed in a simulation environment to verify the control effectiveness of PID, DDPG, and improved DDPG. The simulation results reveal that the improved DDPG controller has a faster response to disturbances, smaller displacement, and angular displacement of the first-order inverted pendulum. The simulation further proves that the improved DDPG algorithm has better stability and convergence and stronger anti-interference ability and stability recovery. This control method provides a certain reference for the application of reinforcement learning in traditional control systems.
first_indexed	2024-03-11T01:46:58Z
format	Article
id	doaj.art-4249938e99f449ea96d0c644b061f17a
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-11T01:46:58Z
publishDate	2023-06-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-4249938e99f449ea96d0c644b061f17a2023-11-18T16:08:39ZengMDPI AGApplied Sciences2076-34172023-06-011313759410.3390/app13137594Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted PendulumHailin Hu0Yuhui Chen1Tao Wang2Fu Feng3Weijin Chen4School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, ChinaSchool of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, ChinaSchool of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, ChinaSchool of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, ChinaSchool of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, ChinaWith the mature development of artificial intelligence technology, the application of intelligent control algorithms in control systems has become a trend to meet the high-performance requirements of modern society. This paper proposes a deep deterministic policy gradient (DDPG) controller design method based on deep reinforcement learning to improve system control performance. Firstly, the optimal control policy of the DDPG algorithm is derived from the Markov decision process and the Actor–Critic algorithm. Secondly, in order to avoid local optima in traditional control systems, the capacity and the settlement method of the DDPG experience pool are adjusted to absorb positive experience to accelerate convergence and to complete efficient training. In response, and to solve the overestimation of the Q value in DDPG, the overall structure of the Critic network is changed to shorten the convergence period of DDPG at low learning rates. Finally, a first-order inverted pendulum control system was constructed in a simulation environment to verify the control effectiveness of PID, DDPG, and improved DDPG. The simulation results reveal that the improved DDPG controller has a faster response to disturbances, smaller displacement, and angular displacement of the first-order inverted pendulum. The simulation further proves that the improved DDPG algorithm has better stability and convergence and stronger anti-interference ability and stability recovery. This control method provides a certain reference for the application of reinforcement learning in traditional control systems.https://www.mdpi.com/2076-3417/13/13/7594deep deterministic policy algorithmoptimal control policylocal optimumoverestimation of Q valuetraditional control systems
spellingShingle	Hailin Hu Yuhui Chen Tao Wang Fu Feng Weijin Chen Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum Applied Sciences deep deterministic policy algorithm optimal control policy local optimum overestimation of Q value traditional control systems
title	Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum
title_full	Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum
title_fullStr	Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum
title_full_unstemmed	Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum
title_short	Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum
title_sort	research on the deep deterministic policy algorithm based on the first order inverted pendulum
topic	deep deterministic policy algorithm optimal control policy local optimum overestimation of Q value traditional control systems
url	https://www.mdpi.com/2076-3417/13/13/7594
work_keys_str_mv	AT hailinhu researchonthedeepdeterministicpolicyalgorithmbasedonthefirstorderinvertedpendulum AT yuhuichen researchonthedeepdeterministicpolicyalgorithmbasedonthefirstorderinvertedpendulum AT taowang researchonthedeepdeterministicpolicyalgorithmbasedonthefirstorderinvertedpendulum AT fufeng researchonthedeepdeterministicpolicyalgorithmbasedonthefirstorderinvertedpendulum AT weijinchen researchonthedeepdeterministicpolicyalgorithmbasedonthefirstorderinvertedpendulum

Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum

Similar Items