An Efficient Centralized Multi-Agent Reinforcement Learner for Cooperative Tasks

Multi-agent reinforcement learning (MARL) for cooperative tasks has been extensively researched over the past decade. The prevalent framework for MARL algorithms is centralized training and decentralized execution. Q-learning is often employed as a centralized learner. However, it requires finding t...

Full description

Bibliographic Details
Main Authors:	Dengyu Liao, Zhen Zhang, Tingting Song, Mingyang Liu
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Multi-agent reinforcement learning reinforcement learning Q-learning multi-agent system
Online Access:	https://ieeexplore.ieee.org/document/10348557/

_version_	1827398985354051584
author	Dengyu Liao Zhen Zhang Tingting Song Mingyang Liu
author_facet	Dengyu Liao Zhen Zhang Tingting Song Mingyang Liu
author_sort	Dengyu Liao
collection	DOAJ
description	Multi-agent reinforcement learning (MARL) for cooperative tasks has been extensively researched over the past decade. The prevalent framework for MARL algorithms is centralized training and decentralized execution. Q-learning is often employed as a centralized learner. However, it requires finding the maximum value by comparing the Q-value of each joint action a’ in the next state s’ to update the Q-value of the last visited state-action pair (s,a). When the joint action space is extensive, the maximization operation involving comparisons becomes time-consuming and becomes the dominant computational burden of the algorithm. To tackle this issue, we propose an algorithm to reduce the number of comparisons by saving the joint actions with the top 2 Q-values (T2Q). Updating the top 2 Q-values involves seven cases, and the T2Q algorithm can avoid traversing the Q-table to update the Q-value in five of these seven cases, thus alleviating the computational burden. Theoretical analysis demonstrates that the upper bound of the expected ratio of comparisons between T2Q and Q-learning decreases as the number of agents increases. Simulation results from two-stage stochastic games are consistent with the theoretical analysis. Furthermore, the effectiveness of the T2Q algorithm is validated through the distributed sensor network task and the target transportation task. The T2Q algorithm successfully completes both tasks with a 100% success rate and minimal computational overhead.
first_indexed	2024-03-08T19:37:46Z
format	Article
id	doaj.art-e59ac58d3db7478f9900b50c1f100a78
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-08T19:37:46Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-e59ac58d3db7478f9900b50c1f100a782023-12-26T00:08:31ZengIEEEIEEE Access2169-35362023-01-011113928413929410.1109/ACCESS.2023.334086710348557An Efficient Centralized Multi-Agent Reinforcement Learner for Cooperative TasksDengyu Liao0Zhen Zhang1https://orcid.org/0000-0002-6615-629XTingting Song2Mingyang Liu3Shandong Key Laboratory of Industrial Control Technology, School of Automation, Qingdao University, Qingdao, ChinaShandong Key Laboratory of Industrial Control Technology, School of Automation, Qingdao University, Qingdao, ChinaQingdao Metro Group Company Ltd., Operating Branch, Qingdao, ChinaShandong Key Laboratory of Industrial Control Technology, School of Automation, Qingdao University, Qingdao, ChinaMulti-agent reinforcement learning (MARL) for cooperative tasks has been extensively researched over the past decade. The prevalent framework for MARL algorithms is centralized training and decentralized execution. Q-learning is often employed as a centralized learner. However, it requires finding the maximum value by comparing the Q-value of each joint action a’ in the next state s’ to update the Q-value of the last visited state-action pair (s,a). When the joint action space is extensive, the maximization operation involving comparisons becomes time-consuming and becomes the dominant computational burden of the algorithm. To tackle this issue, we propose an algorithm to reduce the number of comparisons by saving the joint actions with the top 2 Q-values (T2Q). Updating the top 2 Q-values involves seven cases, and the T2Q algorithm can avoid traversing the Q-table to update the Q-value in five of these seven cases, thus alleviating the computational burden. Theoretical analysis demonstrates that the upper bound of the expected ratio of comparisons between T2Q and Q-learning decreases as the number of agents increases. Simulation results from two-stage stochastic games are consistent with the theoretical analysis. Furthermore, the effectiveness of the T2Q algorithm is validated through the distributed sensor network task and the target transportation task. The T2Q algorithm successfully completes both tasks with a 100% success rate and minimal computational overhead.https://ieeexplore.ieee.org/document/10348557/Multi-agent reinforcement learningreinforcement learningQ-learningmulti-agent system
spellingShingle	Dengyu Liao Zhen Zhang Tingting Song Mingyang Liu An Efficient Centralized Multi-Agent Reinforcement Learner for Cooperative Tasks IEEE Access Multi-agent reinforcement learning reinforcement learning Q-learning multi-agent system
title	An Efficient Centralized Multi-Agent Reinforcement Learner for Cooperative Tasks
title_full	An Efficient Centralized Multi-Agent Reinforcement Learner for Cooperative Tasks
title_fullStr	An Efficient Centralized Multi-Agent Reinforcement Learner for Cooperative Tasks
title_full_unstemmed	An Efficient Centralized Multi-Agent Reinforcement Learner for Cooperative Tasks
title_short	An Efficient Centralized Multi-Agent Reinforcement Learner for Cooperative Tasks
title_sort	efficient centralized multi agent reinforcement learner for cooperative tasks
topic	Multi-agent reinforcement learning reinforcement learning Q-learning multi-agent system
url	https://ieeexplore.ieee.org/document/10348557/
work_keys_str_mv	AT dengyuliao anefficientcentralizedmultiagentreinforcementlearnerforcooperativetasks AT zhenzhang anefficientcentralizedmultiagentreinforcementlearnerforcooperativetasks AT tingtingsong anefficientcentralizedmultiagentreinforcementlearnerforcooperativetasks AT mingyangliu anefficientcentralizedmultiagentreinforcementlearnerforcooperativetasks AT dengyuliao efficientcentralizedmultiagentreinforcementlearnerforcooperativetasks AT zhenzhang efficientcentralizedmultiagentreinforcementlearnerforcooperativetasks AT tingtingsong efficientcentralizedmultiagentreinforcementlearnerforcooperativetasks AT mingyangliu efficientcentralizedmultiagentreinforcementlearnerforcooperativetasks

An Efficient Centralized Multi-Agent Reinforcement Learner for Cooperative Tasks

Similar Items