A new noise network and gradient parallelisation‐based asynchronous advantage actor‐critic algorithm

Abstract Asynchronous advantage actor‐critic (A3C) algorithm is a commonly used policy optimization algorithm in reinforcement learning, in which asynchronous is parallel interactive sampling and training, and advantage is a sampling multi‐step reward estimation method for computing weights. In orde...

Full description

Bibliographic Details
Main Authors: Zhengshun Fei, Yanping Wang, Jinglong Wang, Kangling Liu, Bingqiang Huang, Ping Tan
Format: Article
Language:English
Published: Wiley 2022-09-01
Series:IET Cyber-systems and Robotics
Subjects:
Online Access:https://doi.org/10.1049/csy2.12059