Off‐policy correction algorithm for double Q network based on deep reinforcement learning

Abstract A deep reinforcement learning (DRL) method based on the deep deterministic policy gradient (DDPG) algorithm is proposed to address the problems of a mismatch between the needed training samples and the actual training samples during the training of intelligence, the overestimation and under...

Full description

Bibliographic Details
Main Authors:	Qingbo Zhang, Manlu Liu, Heng Wang, Weimin Qian, Xinglang Zhang
Format:	Article
Language:	English
Published:	Wiley 2023-12-01
Series:	IET Cyber-systems and Robotics
Subjects:	neural network Q‐learning reinforcement learning
Online Access:	https://doi.org/10.1049/csy2.12102

Internet

https://doi.org/10.1049/csy2.12102

Off‐policy correction algorithm for double Q network based on deep reinforcement learning

Internet

Similar Items