A new noise network and gradient parallelisation‐based asynchronous advantage actor‐critic algorithm
Abstract Asynchronous advantage actor‐critic (A3C) algorithm is a commonly used policy optimization algorithm in reinforcement learning, in which asynchronous is parallel interactive sampling and training, and advantage is a sampling multi‐step reward estimation method for computing weights. In orde...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2022-09-01
|
Series: | IET Cyber-systems and Robotics |
Subjects: | |
Online Access: | https://doi.org/10.1049/csy2.12059 |