Off‐policy correction algorithm for double Q network based on deep reinforcement learning
Abstract A deep reinforcement learning (DRL) method based on the deep deterministic policy gradient (DDPG) algorithm is proposed to address the problems of a mismatch between the needed training samples and the actual training samples during the training of intelligence, the overestimation and under...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2023-12-01
|
Series: | IET Cyber-systems and Robotics |
Subjects: | |
Online Access: | https://doi.org/10.1049/csy2.12102 |