Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum
With the mature development of artificial intelligence technology, the application of intelligent control algorithms in control systems has become a trend to meet the high-performance requirements of modern society. This paper proposes a deep deterministic policy gradient (DDPG) controller design me...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-06-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/13/7594 |
_version_ | 1797592121942736896 |
---|---|
author | Hailin Hu Yuhui Chen Tao Wang Fu Feng Weijin Chen |
author_facet | Hailin Hu Yuhui Chen Tao Wang Fu Feng Weijin Chen |
author_sort | Hailin Hu |
collection | DOAJ |
description | With the mature development of artificial intelligence technology, the application of intelligent control algorithms in control systems has become a trend to meet the high-performance requirements of modern society. This paper proposes a deep deterministic policy gradient (DDPG) controller design method based on deep reinforcement learning to improve system control performance. Firstly, the optimal control policy of the DDPG algorithm is derived from the Markov decision process and the Actor–Critic algorithm. Secondly, in order to avoid local optima in traditional control systems, the capacity and the settlement method of the DDPG experience pool are adjusted to absorb positive experience to accelerate convergence and to complete efficient training. In response, and to solve the overestimation of the Q value in DDPG, the overall structure of the Critic network is changed to shorten the convergence period of DDPG at low learning rates. Finally, a first-order inverted pendulum control system was constructed in a simulation environment to verify the control effectiveness of PID, DDPG, and improved DDPG. The simulation results reveal that the improved DDPG controller has a faster response to disturbances, smaller displacement, and angular displacement of the first-order inverted pendulum. The simulation further proves that the improved DDPG algorithm has better stability and convergence and stronger anti-interference ability and stability recovery. This control method provides a certain reference for the application of reinforcement learning in traditional control systems. |
first_indexed | 2024-03-11T01:46:58Z |
format | Article |
id | doaj.art-4249938e99f449ea96d0c644b061f17a |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-11T01:46:58Z |
publishDate | 2023-06-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-4249938e99f449ea96d0c644b061f17a2023-11-18T16:08:39ZengMDPI AGApplied Sciences2076-34172023-06-011313759410.3390/app13137594Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted PendulumHailin Hu0Yuhui Chen1Tao Wang2Fu Feng3Weijin Chen4School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, ChinaSchool of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, ChinaSchool of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, ChinaSchool of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, ChinaSchool of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, ChinaWith the mature development of artificial intelligence technology, the application of intelligent control algorithms in control systems has become a trend to meet the high-performance requirements of modern society. This paper proposes a deep deterministic policy gradient (DDPG) controller design method based on deep reinforcement learning to improve system control performance. Firstly, the optimal control policy of the DDPG algorithm is derived from the Markov decision process and the Actor–Critic algorithm. Secondly, in order to avoid local optima in traditional control systems, the capacity and the settlement method of the DDPG experience pool are adjusted to absorb positive experience to accelerate convergence and to complete efficient training. In response, and to solve the overestimation of the Q value in DDPG, the overall structure of the Critic network is changed to shorten the convergence period of DDPG at low learning rates. Finally, a first-order inverted pendulum control system was constructed in a simulation environment to verify the control effectiveness of PID, DDPG, and improved DDPG. The simulation results reveal that the improved DDPG controller has a faster response to disturbances, smaller displacement, and angular displacement of the first-order inverted pendulum. The simulation further proves that the improved DDPG algorithm has better stability and convergence and stronger anti-interference ability and stability recovery. This control method provides a certain reference for the application of reinforcement learning in traditional control systems.https://www.mdpi.com/2076-3417/13/13/7594deep deterministic policy algorithmoptimal control policylocal optimumoverestimation of Q valuetraditional control systems |
spellingShingle | Hailin Hu Yuhui Chen Tao Wang Fu Feng Weijin Chen Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum Applied Sciences deep deterministic policy algorithm optimal control policy local optimum overestimation of Q value traditional control systems |
title | Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum |
title_full | Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum |
title_fullStr | Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum |
title_full_unstemmed | Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum |
title_short | Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum |
title_sort | research on the deep deterministic policy algorithm based on the first order inverted pendulum |
topic | deep deterministic policy algorithm optimal control policy local optimum overestimation of Q value traditional control systems |
url | https://www.mdpi.com/2076-3417/13/13/7594 |
work_keys_str_mv | AT hailinhu researchonthedeepdeterministicpolicyalgorithmbasedonthefirstorderinvertedpendulum AT yuhuichen researchonthedeepdeterministicpolicyalgorithmbasedonthefirstorderinvertedpendulum AT taowang researchonthedeepdeterministicpolicyalgorithmbasedonthefirstorderinvertedpendulum AT fufeng researchonthedeepdeterministicpolicyalgorithmbasedonthefirstorderinvertedpendulum AT weijinchen researchonthedeepdeterministicpolicyalgorithmbasedonthefirstorderinvertedpendulum |