A new noise network and gradient parallelisation‐based asynchronous advantage actor‐critic algorithm

Abstract Asynchronous advantage actor‐critic (A3C) algorithm is a commonly used policy optimization algorithm in reinforcement learning, in which asynchronous is parallel interactive sampling and training, and advantage is a sampling multi‐step reward estimation method for computing weights. In orde...

Full description

Bibliographic Details
Main Authors:	Zhengshun Fei, Yanping Wang, Jinglong Wang, Kangling Liu, Bingqiang Huang, Ping Tan
Format:	Article
Language:	English
Published:	Wiley 2022-09-01
Series:	IET Cyber-systems and Robotics
Subjects:	asynchronous advantage actor‐critic (A3C) generalised advantage estimation (GAE) parallelisation reinforcement learning
Online Access:	https://doi.org/10.1049/csy2.12059

_version_	1811254379487952896
author	Zhengshun Fei Yanping Wang Jinglong Wang Kangling Liu Bingqiang Huang Ping Tan
author_facet	Zhengshun Fei Yanping Wang Jinglong Wang Kangling Liu Bingqiang Huang Ping Tan
author_sort	Zhengshun Fei
collection	DOAJ
description	Abstract Asynchronous advantage actor‐critic (A3C) algorithm is a commonly used policy optimization algorithm in reinforcement learning, in which asynchronous is parallel interactive sampling and training, and advantage is a sampling multi‐step reward estimation method for computing weights. In order to address the problem of low efficiency and insufficient convergence caused by the traditional heuristic exploration of A3C algorithm in reinforcement learning, an improved A3C algorithm is proposed in this paper. In this algorithm, a noise network function, which updates the noise tensor in an explicit way is constructed to train the agent. Generalised advantage estimation (GAE) is also adopted to describe the dominance function. Finally, a new mean gradient parallelisation method is designed to update the parameters in both the primary and secondary networks by summing and averaging the gradients passed from all the sub‐processes to the main process. Simulation experiments were conducted in a gym environment using the PyTorch Agent Net (PTAN) advanced reinforcement learning library, and the results show that the method enables the agent to complete the learning training faster and its convergence during the training process is better. The improved A3C algorithm has a better performance than the original algorithm, which can provide new ideas for subsequent research on reinforcement learning algorithms.
first_indexed	2024-04-12T17:06:23Z
format	Article
id	doaj.art-aa080922baa346f999cf7aa32107eee0
institution	Directory Open Access Journal
issn	2631-6315
language	English
last_indexed	2024-04-12T17:06:23Z
publishDate	2022-09-01
publisher	Wiley
record_format	Article
series	IET Cyber-systems and Robotics
spelling	doaj.art-aa080922baa346f999cf7aa32107eee02022-12-22T03:23:56ZengWileyIET Cyber-systems and Robotics2631-63152022-09-014317518810.1049/csy2.12059A new noise network and gradient parallelisation‐based asynchronous advantage actor‐critic algorithmZhengshun Fei0Yanping Wang1Jinglong Wang2Kangling Liu3Bingqiang Huang4Ping Tan5Provincial Key Institute of Robotics School of Automation and Electrical Engineering Zhejiang University of Science and Technology Hangzhou ChinaProvincial Key Institute of Robotics School of Automation and Electrical Engineering Zhejiang University of Science and Technology Hangzhou ChinaProvincial Key Institute of Robotics School of Automation and Electrical Engineering Zhejiang University of Science and Technology Hangzhou ChinaState Key Lab of Industrial Control Technology College of Control Science and Engineering Zhejiang University Hangzhou ChinaProvincial Key Institute of Robotics School of Automation and Electrical Engineering Zhejiang University of Science and Technology Hangzhou ChinaProvincial Key Institute of Robotics School of Automation and Electrical Engineering Zhejiang University of Science and Technology Hangzhou ChinaAbstract Asynchronous advantage actor‐critic (A3C) algorithm is a commonly used policy optimization algorithm in reinforcement learning, in which asynchronous is parallel interactive sampling and training, and advantage is a sampling multi‐step reward estimation method for computing weights. In order to address the problem of low efficiency and insufficient convergence caused by the traditional heuristic exploration of A3C algorithm in reinforcement learning, an improved A3C algorithm is proposed in this paper. In this algorithm, a noise network function, which updates the noise tensor in an explicit way is constructed to train the agent. Generalised advantage estimation (GAE) is also adopted to describe the dominance function. Finally, a new mean gradient parallelisation method is designed to update the parameters in both the primary and secondary networks by summing and averaging the gradients passed from all the sub‐processes to the main process. Simulation experiments were conducted in a gym environment using the PyTorch Agent Net (PTAN) advanced reinforcement learning library, and the results show that the method enables the agent to complete the learning training faster and its convergence during the training process is better. The improved A3C algorithm has a better performance than the original algorithm, which can provide new ideas for subsequent research on reinforcement learning algorithms.https://doi.org/10.1049/csy2.12059asynchronous advantage actor‐critic (A3C)generalised advantage estimation (GAE)parallelisationreinforcement learning
spellingShingle	Zhengshun Fei Yanping Wang Jinglong Wang Kangling Liu Bingqiang Huang Ping Tan A new noise network and gradient parallelisation‐based asynchronous advantage actor‐critic algorithm IET Cyber-systems and Robotics asynchronous advantage actor‐critic (A3C) generalised advantage estimation (GAE) parallelisation reinforcement learning
title	A new noise network and gradient parallelisation‐based asynchronous advantage actor‐critic algorithm
title_full	A new noise network and gradient parallelisation‐based asynchronous advantage actor‐critic algorithm
title_fullStr	A new noise network and gradient parallelisation‐based asynchronous advantage actor‐critic algorithm
title_full_unstemmed	A new noise network and gradient parallelisation‐based asynchronous advantage actor‐critic algorithm
title_short	A new noise network and gradient parallelisation‐based asynchronous advantage actor‐critic algorithm
title_sort	new noise network and gradient parallelisation based asynchronous advantage actor critic algorithm
topic	asynchronous advantage actor‐critic (A3C) generalised advantage estimation (GAE) parallelisation reinforcement learning
url	https://doi.org/10.1049/csy2.12059
work_keys_str_mv	AT zhengshunfei anewnoisenetworkandgradientparallelisationbasedasynchronousadvantageactorcriticalgorithm AT yanpingwang anewnoisenetworkandgradientparallelisationbasedasynchronousadvantageactorcriticalgorithm AT jinglongwang anewnoisenetworkandgradientparallelisationbasedasynchronousadvantageactorcriticalgorithm AT kanglingliu anewnoisenetworkandgradientparallelisationbasedasynchronousadvantageactorcriticalgorithm AT bingqianghuang anewnoisenetworkandgradientparallelisationbasedasynchronousadvantageactorcriticalgorithm AT pingtan anewnoisenetworkandgradientparallelisationbasedasynchronousadvantageactorcriticalgorithm AT zhengshunfei newnoisenetworkandgradientparallelisationbasedasynchronousadvantageactorcriticalgorithm AT yanpingwang newnoisenetworkandgradientparallelisationbasedasynchronousadvantageactorcriticalgorithm AT jinglongwang newnoisenetworkandgradientparallelisationbasedasynchronousadvantageactorcriticalgorithm AT kanglingliu newnoisenetworkandgradientparallelisationbasedasynchronousadvantageactorcriticalgorithm AT bingqianghuang newnoisenetworkandgradientparallelisationbasedasynchronousadvantageactorcriticalgorithm AT pingtan newnoisenetworkandgradientparallelisationbasedasynchronousadvantageactorcriticalgorithm

A new noise network and gradient parallelisation‐based asynchronous advantage actor‐critic algorithm

Similar Items