Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction

Although multi-agent reinforcement learning (MARL) is a promising method for learning a collaborative action policy, enabling each agent to accomplish specified tasks, MARL has a problem of exponentially increasing state-action space. This state-action space can be dramatically reduced by assuming s...

全面介绍

书目详细资料
Main Authors: Toshihiro Kujirai, Takayoshi Yokota
格式: 文件
语言:English
出版: Taylor & Francis Group 2019-05-01
丛编:SICE Journal of Control, Measurement, and System Integration
主题:
在线阅读:http://dx.doi.org/10.9746/jcmsi.12.76
_version_ 1827794773652537344
author Toshihiro Kujirai
Takayoshi Yokota
author_facet Toshihiro Kujirai
Takayoshi Yokota
author_sort Toshihiro Kujirai
collection DOAJ
description Although multi-agent reinforcement learning (MARL) is a promising method for learning a collaborative action policy, enabling each agent to accomplish specified tasks, MARL has a problem of exponentially increasing state-action space. This state-action space can be dramatically reduced by assuming sparse interaction. We previously proposed three methods (greedily selecting actions, switching between Q-value update equations on the basis of the state of each agent in the next step, and their combination) for improving the performance of coordinating Q-learning (CQ-learning), a typical method for multi-agent reinforcement learning with sparse interaction. We have now modified the learning algorithm used in a combination of these two methods to enable it to cope with interference among more than two agents. Evaluation of this enhanced method using two additional maze games from three perspectives (the number of steps to a goal, the number of augmented states, and the computational cost) demonstrated that the modified algorithm improves the performance of CQ-learning.
first_indexed 2024-03-11T18:38:36Z
format Article
id doaj.art-40c02c5c441b4d1a8d75ba606c3e92da
institution Directory Open Access Journal
issn 1884-9970
language English
last_indexed 2024-03-11T18:38:36Z
publishDate 2019-05-01
publisher Taylor & Francis Group
record_format Article
series SICE Journal of Control, Measurement, and System Integration
spelling doaj.art-40c02c5c441b4d1a8d75ba606c3e92da2023-10-12T13:43:55ZengTaylor & Francis GroupSICE Journal of Control, Measurement, and System Integration1884-99702019-05-01123768410.9746/jcmsi.12.7612103255Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse InteractionToshihiro Kujirai0Takayoshi Yokota1Department of Information and Electronic, Graduate School of Engineering, Tottori UniversityDepartment of Information and Electronic, Graduate School of Engineering, Tottori UniversityAlthough multi-agent reinforcement learning (MARL) is a promising method for learning a collaborative action policy, enabling each agent to accomplish specified tasks, MARL has a problem of exponentially increasing state-action space. This state-action space can be dramatically reduced by assuming sparse interaction. We previously proposed three methods (greedily selecting actions, switching between Q-value update equations on the basis of the state of each agent in the next step, and their combination) for improving the performance of coordinating Q-learning (CQ-learning), a typical method for multi-agent reinforcement learning with sparse interaction. We have now modified the learning algorithm used in a combination of these two methods to enable it to cope with interference among more than two agents. Evaluation of this enhanced method using two additional maze games from three perspectives (the number of steps to a goal, the number of augmented states, and the computational cost) demonstrated that the modified algorithm improves the performance of CQ-learning.http://dx.doi.org/10.9746/jcmsi.12.76reinforcement learningmulti agentsparse interactionfully cooperativemaze games
spellingShingle Toshihiro Kujirai
Takayoshi Yokota
Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction
SICE Journal of Control, Measurement, and System Integration
reinforcement learning
multi agent
sparse interaction
fully cooperative
maze games
title Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction
title_full Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction
title_fullStr Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction
title_full_unstemmed Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction
title_short Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction
title_sort greedy action selection and pessimistic q value updating in multi agent reinforcement learning with sparse interaction
topic reinforcement learning
multi agent
sparse interaction
fully cooperative
maze games
url http://dx.doi.org/10.9746/jcmsi.12.76
work_keys_str_mv AT toshihirokujirai greedyactionselectionandpessimisticqvalueupdatinginmultiagentreinforcementlearningwithsparseinteraction
AT takayoshiyokota greedyactionselectionandpessimisticqvalueupdatinginmultiagentreinforcementlearningwithsparseinteraction