CuMARL: Curiosity-Based Learning in Multiagent Reinforcement Learning
In this paper, we propose a novel curiosity-based learning algorithm for Multi-agent Reinforcement Learning (MARL) to attain efficient and effective decision-making. We employ the centralized training with decentralized execution framework (CTDE) and consider that each agent has knowledge of the pri...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9857920/ |
_version_ | 1798036540025208832 |
---|---|
author | Devarani Devi Ningombam Byunghyun Yoo Hyun Woo Kim Hwa Jeon Song Sungwon Yi |
author_facet | Devarani Devi Ningombam Byunghyun Yoo Hyun Woo Kim Hwa Jeon Song Sungwon Yi |
author_sort | Devarani Devi Ningombam |
collection | DOAJ |
description | In this paper, we propose a novel curiosity-based learning algorithm for Multi-agent Reinforcement Learning (MARL) to attain efficient and effective decision-making. We employ the centralized training with decentralized execution framework (CTDE) and consider that each agent has knowledge of the prior action distribution of others. To quantify the difference in agents’ knowledge, curiosity, we introduce conditional mutual information (CMI) regularization and use the amount of information for updating decision-making policy. Then, to deploy these learning frameworks in a large-scale MARL setting while acquiring high sample efficiency, we consider a Kullback-Leibler (KL) divergence-based prioritization of experiences. We evaluate the effectiveness of the proposed algorithm in three different levels of StarCraft Multi Agent Challenge (SMAC) scenarios using the PyMARL framework. The simulation-based performance analysis shows that the proposed technique significantly improves the test win rate compared to various state-of-the-art MARL benchmarks, such as the Optimistically Weighted Monotonic Value Function Factorization (OW_QMIX) and Learning Individual Intrinsic Reward (LIIR). |
first_indexed | 2024-04-11T21:14:20Z |
format | Article |
id | doaj.art-ea341c98ba52493b8fe43c46f77bd8a6 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-11T21:14:20Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-ea341c98ba52493b8fe43c46f77bd8a62022-12-22T04:02:53ZengIEEEIEEE Access2169-35362022-01-0110872548726510.1109/ACCESS.2022.31989819857920CuMARL: Curiosity-Based Learning in Multiagent Reinforcement LearningDevarani Devi Ningombam0https://orcid.org/0000-0002-6946-6584Byunghyun Yoo1https://orcid.org/0000-0003-0857-5565Hyun Woo Kim2Hwa Jeon Song3https://orcid.org/0000-0002-8216-4812Sungwon Yi4https://orcid.org/0000-0002-4986-9546Department of Informatics, School of Computer Science, University of Petroleum and Energy Studies (UPES), Dehradun, Uttarakhand, IndiaElectronics and Telecommunications Research Institute (ETRI), Daejeon, South KoreaElectronics and Telecommunications Research Institute (ETRI), Daejeon, South KoreaElectronics and Telecommunications Research Institute (ETRI), Daejeon, South KoreaElectronics and Telecommunications Research Institute (ETRI), Daejeon, South KoreaIn this paper, we propose a novel curiosity-based learning algorithm for Multi-agent Reinforcement Learning (MARL) to attain efficient and effective decision-making. We employ the centralized training with decentralized execution framework (CTDE) and consider that each agent has knowledge of the prior action distribution of others. To quantify the difference in agents’ knowledge, curiosity, we introduce conditional mutual information (CMI) regularization and use the amount of information for updating decision-making policy. Then, to deploy these learning frameworks in a large-scale MARL setting while acquiring high sample efficiency, we consider a Kullback-Leibler (KL) divergence-based prioritization of experiences. We evaluate the effectiveness of the proposed algorithm in three different levels of StarCraft Multi Agent Challenge (SMAC) scenarios using the PyMARL framework. The simulation-based performance analysis shows that the proposed technique significantly improves the test win rate compared to various state-of-the-art MARL benchmarks, such as the Optimistically Weighted Monotonic Value Function Factorization (OW_QMIX) and Learning Individual Intrinsic Reward (LIIR).https://ieeexplore.ieee.org/document/9857920/Multi-agent reinforcement learningcuriosityconditional mutual informationprioritized experience replay |
spellingShingle | Devarani Devi Ningombam Byunghyun Yoo Hyun Woo Kim Hwa Jeon Song Sungwon Yi CuMARL: Curiosity-Based Learning in Multiagent Reinforcement Learning IEEE Access Multi-agent reinforcement learning curiosity conditional mutual information prioritized experience replay |
title | CuMARL: Curiosity-Based Learning in Multiagent Reinforcement Learning |
title_full | CuMARL: Curiosity-Based Learning in Multiagent Reinforcement Learning |
title_fullStr | CuMARL: Curiosity-Based Learning in Multiagent Reinforcement Learning |
title_full_unstemmed | CuMARL: Curiosity-Based Learning in Multiagent Reinforcement Learning |
title_short | CuMARL: Curiosity-Based Learning in Multiagent Reinforcement Learning |
title_sort | cumarl curiosity based learning in multiagent reinforcement learning |
topic | Multi-agent reinforcement learning curiosity conditional mutual information prioritized experience replay |
url | https://ieeexplore.ieee.org/document/9857920/ |
work_keys_str_mv | AT devaranideviningombam cumarlcuriositybasedlearninginmultiagentreinforcementlearning AT byunghyunyoo cumarlcuriositybasedlearninginmultiagentreinforcementlearning AT hyunwookim cumarlcuriositybasedlearninginmultiagentreinforcementlearning AT hwajeonsong cumarlcuriositybasedlearninginmultiagentreinforcementlearning AT sungwonyi cumarlcuriositybasedlearninginmultiagentreinforcementlearning |