Off‐policy correction algorithm for double Q network based on deep reinforcement learning

Abstract A deep reinforcement learning (DRL) method based on the deep deterministic policy gradient (DDPG) algorithm is proposed to address the problems of a mismatch between the needed training samples and the actual training samples during the training of intelligence, the overestimation and under...

Full description

Bibliographic Details
Main Authors: Qingbo Zhang, Manlu Liu, Heng Wang, Weimin Qian, Xinglang Zhang
Format: Article
Language:English
Published: Wiley 2023-12-01
Series:IET Cyber-systems and Robotics
Subjects:
Online Access:https://doi.org/10.1049/csy2.12102