Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction

Although multi-agent reinforcement learning (MARL) is a promising method for learning a collaborative action policy, enabling each agent to accomplish specified tasks, MARL has a problem of exponentially increasing state-action space. This state-action space can be dramatically reduced by assuming s...

Full description

Bibliographic Details
Main Authors:	Toshihiro Kujirai, Takayoshi Yokota
Format:	Article
Language:	English
Published:	Taylor & Francis Group 2019-05-01
Series:	SICE Journal of Control, Measurement, and System Integration
Subjects:	reinforcement learning multi agent sparse interaction fully cooperative maze games
Online Access:	http://dx.doi.org/10.9746/jcmsi.12.76

_version_	1827794773652537344
author	Toshihiro Kujirai Takayoshi Yokota
author_facet	Toshihiro Kujirai Takayoshi Yokota
author_sort	Toshihiro Kujirai
collection	DOAJ
description	Although multi-agent reinforcement learning (MARL) is a promising method for learning a collaborative action policy, enabling each agent to accomplish specified tasks, MARL has a problem of exponentially increasing state-action space. This state-action space can be dramatically reduced by assuming sparse interaction. We previously proposed three methods (greedily selecting actions, switching between Q-value update equations on the basis of the state of each agent in the next step, and their combination) for improving the performance of coordinating Q-learning (CQ-learning), a typical method for multi-agent reinforcement learning with sparse interaction. We have now modified the learning algorithm used in a combination of these two methods to enable it to cope with interference among more than two agents. Evaluation of this enhanced method using two additional maze games from three perspectives (the number of steps to a goal, the number of augmented states, and the computational cost) demonstrated that the modified algorithm improves the performance of CQ-learning.
first_indexed	2024-03-11T18:38:36Z
format	Article
id	doaj.art-40c02c5c441b4d1a8d75ba606c3e92da
institution	Directory Open Access Journal
issn	1884-9970
language	English
last_indexed	2024-03-11T18:38:36Z
publishDate	2019-05-01
publisher	Taylor & Francis Group
record_format	Article
series	SICE Journal of Control, Measurement, and System Integration
spelling	doaj.art-40c02c5c441b4d1a8d75ba606c3e92da2023-10-12T13:43:55ZengTaylor & Francis GroupSICE Journal of Control, Measurement, and System Integration1884-99702019-05-01123768410.9746/jcmsi.12.7612103255Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse InteractionToshihiro Kujirai0Takayoshi Yokota1Department of Information and Electronic, Graduate School of Engineering, Tottori UniversityDepartment of Information and Electronic, Graduate School of Engineering, Tottori UniversityAlthough multi-agent reinforcement learning (MARL) is a promising method for learning a collaborative action policy, enabling each agent to accomplish specified tasks, MARL has a problem of exponentially increasing state-action space. This state-action space can be dramatically reduced by assuming sparse interaction. We previously proposed three methods (greedily selecting actions, switching between Q-value update equations on the basis of the state of each agent in the next step, and their combination) for improving the performance of coordinating Q-learning (CQ-learning), a typical method for multi-agent reinforcement learning with sparse interaction. We have now modified the learning algorithm used in a combination of these two methods to enable it to cope with interference among more than two agents. Evaluation of this enhanced method using two additional maze games from three perspectives (the number of steps to a goal, the number of augmented states, and the computational cost) demonstrated that the modified algorithm improves the performance of CQ-learning.http://dx.doi.org/10.9746/jcmsi.12.76reinforcement learningmulti agentsparse interactionfully cooperativemaze games
spellingShingle	Toshihiro Kujirai Takayoshi Yokota Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction SICE Journal of Control, Measurement, and System Integration reinforcement learning multi agent sparse interaction fully cooperative maze games
title	Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction
title_full	Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction
title_fullStr	Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction
title_full_unstemmed	Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction
title_short	Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction
title_sort	greedy action selection and pessimistic q value updating in multi agent reinforcement learning with sparse interaction
topic	reinforcement learning multi agent sparse interaction fully cooperative maze games
url	http://dx.doi.org/10.9746/jcmsi.12.76
work_keys_str_mv	AT toshihirokujirai greedyactionselectionandpessimisticqvalueupdatinginmultiagentreinforcementlearningwithsparseinteraction AT takayoshiyokota greedyactionselectionandpessimisticqvalueupdatinginmultiagentreinforcementlearningwithsparseinteraction

Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction

Similar Items