Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction
Although multi-agent reinforcement learning (MARL) is a promising method for learning a collaborative action policy, enabling each agent to accomplish specified tasks, MARL has a problem of exponentially increasing state-action space. This state-action space can be dramatically reduced by assuming s...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2019-05-01
|
Series: | SICE Journal of Control, Measurement, and System Integration |
Subjects: | |
Online Access: | http://dx.doi.org/10.9746/jcmsi.12.76 |
_version_ | 1827794773652537344 |
---|---|
author | Toshihiro Kujirai Takayoshi Yokota |
author_facet | Toshihiro Kujirai Takayoshi Yokota |
author_sort | Toshihiro Kujirai |
collection | DOAJ |
description | Although multi-agent reinforcement learning (MARL) is a promising method for learning a collaborative action policy, enabling each agent to accomplish specified tasks, MARL has a problem of exponentially increasing state-action space. This state-action space can be dramatically reduced by assuming sparse interaction. We previously proposed three methods (greedily selecting actions, switching between Q-value update equations on the basis of the state of each agent in the next step, and their combination) for improving the performance of coordinating Q-learning (CQ-learning), a typical method for multi-agent reinforcement learning with sparse interaction. We have now modified the learning algorithm used in a combination of these two methods to enable it to cope with interference among more than two agents. Evaluation of this enhanced method using two additional maze games from three perspectives (the number of steps to a goal, the number of augmented states, and the computational cost) demonstrated that the modified algorithm improves the performance of CQ-learning. |
first_indexed | 2024-03-11T18:38:36Z |
format | Article |
id | doaj.art-40c02c5c441b4d1a8d75ba606c3e92da |
institution | Directory Open Access Journal |
issn | 1884-9970 |
language | English |
last_indexed | 2024-03-11T18:38:36Z |
publishDate | 2019-05-01 |
publisher | Taylor & Francis Group |
record_format | Article |
series | SICE Journal of Control, Measurement, and System Integration |
spelling | doaj.art-40c02c5c441b4d1a8d75ba606c3e92da2023-10-12T13:43:55ZengTaylor & Francis GroupSICE Journal of Control, Measurement, and System Integration1884-99702019-05-01123768410.9746/jcmsi.12.7612103255Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse InteractionToshihiro Kujirai0Takayoshi Yokota1Department of Information and Electronic, Graduate School of Engineering, Tottori UniversityDepartment of Information and Electronic, Graduate School of Engineering, Tottori UniversityAlthough multi-agent reinforcement learning (MARL) is a promising method for learning a collaborative action policy, enabling each agent to accomplish specified tasks, MARL has a problem of exponentially increasing state-action space. This state-action space can be dramatically reduced by assuming sparse interaction. We previously proposed three methods (greedily selecting actions, switching between Q-value update equations on the basis of the state of each agent in the next step, and their combination) for improving the performance of coordinating Q-learning (CQ-learning), a typical method for multi-agent reinforcement learning with sparse interaction. We have now modified the learning algorithm used in a combination of these two methods to enable it to cope with interference among more than two agents. Evaluation of this enhanced method using two additional maze games from three perspectives (the number of steps to a goal, the number of augmented states, and the computational cost) demonstrated that the modified algorithm improves the performance of CQ-learning.http://dx.doi.org/10.9746/jcmsi.12.76reinforcement learningmulti agentsparse interactionfully cooperativemaze games |
spellingShingle | Toshihiro Kujirai Takayoshi Yokota Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction SICE Journal of Control, Measurement, and System Integration reinforcement learning multi agent sparse interaction fully cooperative maze games |
title | Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction |
title_full | Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction |
title_fullStr | Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction |
title_full_unstemmed | Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction |
title_short | Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction |
title_sort | greedy action selection and pessimistic q value updating in multi agent reinforcement learning with sparse interaction |
topic | reinforcement learning multi agent sparse interaction fully cooperative maze games |
url | http://dx.doi.org/10.9746/jcmsi.12.76 |
work_keys_str_mv | AT toshihirokujirai greedyactionselectionandpessimisticqvalueupdatinginmultiagentreinforcementlearningwithsparseinteraction AT takayoshiyokota greedyactionselectionandpessimisticqvalueupdatinginmultiagentreinforcementlearningwithsparseinteraction |