A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems
Adaptive dynamic programming (ADP) is generally implemented using three neural networks: model network, action network, and critic network. In the conventional works of the value iteration ADP, the model network is initialized randomly and trained by the backpropagation algorithm, whose results are...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9326299/ |
_version_ | 1818663206444335104 |
---|---|
author | Junping Hu Gen Yang Zhicheng Hou Gong Zhang Wenlin Yang Weijun Wang |
author_facet | Junping Hu Gen Yang Zhicheng Hou Gong Zhang Wenlin Yang Weijun Wang |
author_sort | Junping Hu |
collection | DOAJ |
description | Adaptive dynamic programming (ADP) is generally implemented using three neural networks: model network, action network, and critic network. In the conventional works of the value iteration ADP, the model network is initialized randomly and trained by the backpropagation algorithm, whose results are easy to get trapped in a local minimum; both the critic network and action network are trained in each outer-loop, which is time-consuming. To approximate the optimal control policy more accurately and decrease the value iteration ADP training time, we propose a nearer optimal and faster trained value iteration ADP for discrete-time nonlinear systems in this study. First, before training the model network with a backpropagation algorithm, we use a global searching method, i.e., genetic algorithm, to evolve the weights and biases of the neural network for a few generations. Second, in the outer-loop training process, we propose a trigger mechanism to decide whether to train the action network or not, which can save much training time. Examples of both linear and nonlinear systems are induced to verify the superiority of the proposed method compared with the conventional value iteration ADP. The simulation results show that the proposed algorithm can provide a nearer optimal control policy and save more training time than the conventional value iteration ADP. |
first_indexed | 2024-12-17T05:13:10Z |
format | Article |
id | doaj.art-0d244ef891684464b56abf209f38778d |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-17T05:13:10Z |
publishDate | 2021-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-0d244ef891684464b56abf209f38778d2022-12-21T22:02:12ZengIEEEIEEE Access2169-35362021-01-019149331494410.1109/ACCESS.2021.30519849326299A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear SystemsJunping Hu0Gen Yang1https://orcid.org/0000-0001-8606-5523Zhicheng Hou2https://orcid.org/0000-0002-5319-9856Gong Zhang3Wenlin Yang4https://orcid.org/0000-0002-6725-182XWeijun Wang5https://orcid.org/0000-0001-6011-2598College of Mechanical and Electrical Engineering, Central South University, Changsha, ChinaCollege of Mechanical and Electrical Engineering, Central South University, Changsha, ChinaGuangzhou Institute of Advanced Technology, Chinese Academy of Sciences, Guangzhou, ChinaGuangzhou Institute of Advanced Technology, Chinese Academy of Sciences, Guangzhou, ChinaGuangzhou Institute of Advanced Technology, Chinese Academy of Sciences, Guangzhou, ChinaGuangzhou Institute of Advanced Technology, Chinese Academy of Sciences, Guangzhou, ChinaAdaptive dynamic programming (ADP) is generally implemented using three neural networks: model network, action network, and critic network. In the conventional works of the value iteration ADP, the model network is initialized randomly and trained by the backpropagation algorithm, whose results are easy to get trapped in a local minimum; both the critic network and action network are trained in each outer-loop, which is time-consuming. To approximate the optimal control policy more accurately and decrease the value iteration ADP training time, we propose a nearer optimal and faster trained value iteration ADP for discrete-time nonlinear systems in this study. First, before training the model network with a backpropagation algorithm, we use a global searching method, i.e., genetic algorithm, to evolve the weights and biases of the neural network for a few generations. Second, in the outer-loop training process, we propose a trigger mechanism to decide whether to train the action network or not, which can save much training time. Examples of both linear and nonlinear systems are induced to verify the superiority of the proposed method compared with the conventional value iteration ADP. The simulation results show that the proposed algorithm can provide a nearer optimal control policy and save more training time than the conventional value iteration ADP.https://ieeexplore.ieee.org/document/9326299/ADPvalue iterationgenetic algorithmtrigger mechanism |
spellingShingle | Junping Hu Gen Yang Zhicheng Hou Gong Zhang Wenlin Yang Weijun Wang A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems IEEE Access ADP value iteration genetic algorithm trigger mechanism |
title | A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems |
title_full | A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems |
title_fullStr | A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems |
title_full_unstemmed | A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems |
title_short | A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems |
title_sort | nearer optimal and faster trained value iteration adp for discrete time nonlinear systems |
topic | ADP value iteration genetic algorithm trigger mechanism |
url | https://ieeexplore.ieee.org/document/9326299/ |
work_keys_str_mv | AT junpinghu aneareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT genyang aneareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT zhichenghou aneareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT gongzhang aneareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT wenlinyang aneareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT weijunwang aneareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT junpinghu neareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT genyang neareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT zhichenghou neareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT gongzhang neareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT wenlinyang neareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems AT weijunwang neareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems |