A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems

Adaptive dynamic programming (ADP) is generally implemented using three neural networks: model network, action network, and critic network. In the conventional works of the value iteration ADP, the model network is initialized randomly and trained by the backpropagation algorithm, whose results are...

Full description

Bibliographic Details
Main Authors: Junping Hu, Gen Yang, Zhicheng Hou, Gong Zhang, Wenlin Yang, Weijun Wang
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9326299/
_version_ 1818663206444335104
author Junping Hu
Gen Yang
Zhicheng Hou
Gong Zhang
Wenlin Yang
Weijun Wang
author_facet Junping Hu
Gen Yang
Zhicheng Hou
Gong Zhang
Wenlin Yang
Weijun Wang
author_sort Junping Hu
collection DOAJ
description Adaptive dynamic programming (ADP) is generally implemented using three neural networks: model network, action network, and critic network. In the conventional works of the value iteration ADP, the model network is initialized randomly and trained by the backpropagation algorithm, whose results are easy to get trapped in a local minimum; both the critic network and action network are trained in each outer-loop, which is time-consuming. To approximate the optimal control policy more accurately and decrease the value iteration ADP training time, we propose a nearer optimal and faster trained value iteration ADP for discrete-time nonlinear systems in this study. First, before training the model network with a backpropagation algorithm, we use a global searching method, i.e., genetic algorithm, to evolve the weights and biases of the neural network for a few generations. Second, in the outer-loop training process, we propose a trigger mechanism to decide whether to train the action network or not, which can save much training time. Examples of both linear and nonlinear systems are induced to verify the superiority of the proposed method compared with the conventional value iteration ADP. The simulation results show that the proposed algorithm can provide a nearer optimal control policy and save more training time than the conventional value iteration ADP.
first_indexed 2024-12-17T05:13:10Z
format Article
id doaj.art-0d244ef891684464b56abf209f38778d
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-17T05:13:10Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-0d244ef891684464b56abf209f38778d2022-12-21T22:02:12ZengIEEEIEEE Access2169-35362021-01-019149331494410.1109/ACCESS.2021.30519849326299A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear SystemsJunping Hu0Gen Yang1https://orcid.org/0000-0001-8606-5523Zhicheng Hou2https://orcid.org/0000-0002-5319-9856Gong Zhang3Wenlin Yang4https://orcid.org/0000-0002-6725-182XWeijun Wang5https://orcid.org/0000-0001-6011-2598College of Mechanical and Electrical Engineering, Central South University, Changsha, ChinaCollege of Mechanical and Electrical Engineering, Central South University, Changsha, ChinaGuangzhou Institute of Advanced Technology, Chinese Academy of Sciences, Guangzhou, ChinaGuangzhou Institute of Advanced Technology, Chinese Academy of Sciences, Guangzhou, ChinaGuangzhou Institute of Advanced Technology, Chinese Academy of Sciences, Guangzhou, ChinaGuangzhou Institute of Advanced Technology, Chinese Academy of Sciences, Guangzhou, ChinaAdaptive dynamic programming (ADP) is generally implemented using three neural networks: model network, action network, and critic network. In the conventional works of the value iteration ADP, the model network is initialized randomly and trained by the backpropagation algorithm, whose results are easy to get trapped in a local minimum; both the critic network and action network are trained in each outer-loop, which is time-consuming. To approximate the optimal control policy more accurately and decrease the value iteration ADP training time, we propose a nearer optimal and faster trained value iteration ADP for discrete-time nonlinear systems in this study. First, before training the model network with a backpropagation algorithm, we use a global searching method, i.e., genetic algorithm, to evolve the weights and biases of the neural network for a few generations. Second, in the outer-loop training process, we propose a trigger mechanism to decide whether to train the action network or not, which can save much training time. Examples of both linear and nonlinear systems are induced to verify the superiority of the proposed method compared with the conventional value iteration ADP. The simulation results show that the proposed algorithm can provide a nearer optimal control policy and save more training time than the conventional value iteration ADP.https://ieeexplore.ieee.org/document/9326299/ADPvalue iterationgenetic algorithmtrigger mechanism
spellingShingle Junping Hu
Gen Yang
Zhicheng Hou
Gong Zhang
Wenlin Yang
Weijun Wang
A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems
IEEE Access
ADP
value iteration
genetic algorithm
trigger mechanism
title A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems
title_full A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems
title_fullStr A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems
title_full_unstemmed A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems
title_short A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems
title_sort nearer optimal and faster trained value iteration adp for discrete time nonlinear systems
topic ADP
value iteration
genetic algorithm
trigger mechanism
url https://ieeexplore.ieee.org/document/9326299/
work_keys_str_mv AT junpinghu aneareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems
AT genyang aneareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems
AT zhichenghou aneareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems
AT gongzhang aneareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems
AT wenlinyang aneareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems
AT weijunwang aneareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems
AT junpinghu neareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems
AT genyang neareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems
AT zhichenghou neareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems
AT gongzhang neareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems
AT wenlinyang neareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems
AT weijunwang neareroptimalandfastertrainedvalueiterationadpfordiscretetimenonlinearsystems