Robust multi-agent team behaviors in uncertain environment via reinforcement learning

Many state-of-the-art cooperative multi-agent reinforcement learning (MARL) approaches, such as MADDPG, COMA, and QMIX have focused mainly on performing well in idealized scenarios. Agents face similar environmental conditions and opponents encountered during training. The resulting policies are of...

Full description

Bibliographic Details
Main Author: Yan, Kok Hong
Other Authors: Bo An
Format: Thesis-Master by Research
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/159448
_version_ 1811697146746896384
author Yan, Kok Hong
author2 Bo An
author_facet Bo An
Yan, Kok Hong
author_sort Yan, Kok Hong
collection NTU
description Many state-of-the-art cooperative multi-agent reinforcement learning (MARL) approaches, such as MADDPG, COMA, and QMIX have focused mainly on performing well in idealized scenarios. Agents face similar environmental conditions and opponents encountered during training. The resulting policies are often fragile and brittle from overfitting to the training environment. These policies cannot be easily deployed out of the laboratory. While adversarial learning is a way to train robust policies, many of these works have focused on single-agent RL and adversarial updates to the static environment. Some robust MARL works are designed based on adversarial training. These works have focused on specialized settings. M3DDPG focuses on an extreme setting in which all other agents are assumed to be adversarial. Phan et al. looked at the setting where agents malfunction and turn adversarial. Many of these works have compromised on team coordination to achieve robustness. There is little emphasis on maintaining good team coordination while ensuring robustness. This is an obvious gap where robustness should be part of the MARL algorithm design objectives besides performance, rather than an afterthought. This work focuses on learning robust team policy that would perform well even when the environment and opponent behaviour is significantly different from training. We propose the Signal-mediated Team Maxmin (STeaM) framework. STeaM is an end-to-end MARL framework that approximates the game-theoretic solution concept of team-maxmin equilibrium with a correlation device (TMECor), to address issues of agent coordination and policy robustness. STeaM uses a pre-agreed signal to coordinate team actions and approximate TMECor policies through consistency and diversity regularizations together with a best-response gradient descent self-play equilibrium learning procedure. Our experiments show that STeaM can learn team agent policies that approximate TMECor well. These policies can consistently achieve higher rewards in adversarial and uncertain situations over policies produced by other state-of-art models. The STeaM produced policies also exhibit bounded performance degradation when tested previously unseen policies.
first_indexed 2024-10-01T07:50:37Z
format Thesis-Master by Research
id ntu-10356/159448
institution Nanyang Technological University
language English
last_indexed 2024-10-01T07:50:37Z
publishDate 2022
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1594482022-06-20T02:50:02Z Robust multi-agent team behaviors in uncertain environment via reinforcement learning Yan, Kok Hong Bo An School of Computer Science and Engineering boan@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Many state-of-the-art cooperative multi-agent reinforcement learning (MARL) approaches, such as MADDPG, COMA, and QMIX have focused mainly on performing well in idealized scenarios. Agents face similar environmental conditions and opponents encountered during training. The resulting policies are often fragile and brittle from overfitting to the training environment. These policies cannot be easily deployed out of the laboratory. While adversarial learning is a way to train robust policies, many of these works have focused on single-agent RL and adversarial updates to the static environment. Some robust MARL works are designed based on adversarial training. These works have focused on specialized settings. M3DDPG focuses on an extreme setting in which all other agents are assumed to be adversarial. Phan et al. looked at the setting where agents malfunction and turn adversarial. Many of these works have compromised on team coordination to achieve robustness. There is little emphasis on maintaining good team coordination while ensuring robustness. This is an obvious gap where robustness should be part of the MARL algorithm design objectives besides performance, rather than an afterthought. This work focuses on learning robust team policy that would perform well even when the environment and opponent behaviour is significantly different from training. We propose the Signal-mediated Team Maxmin (STeaM) framework. STeaM is an end-to-end MARL framework that approximates the game-theoretic solution concept of team-maxmin equilibrium with a correlation device (TMECor), to address issues of agent coordination and policy robustness. STeaM uses a pre-agreed signal to coordinate team actions and approximate TMECor policies through consistency and diversity regularizations together with a best-response gradient descent self-play equilibrium learning procedure. Our experiments show that STeaM can learn team agent policies that approximate TMECor well. These policies can consistently achieve higher rewards in adversarial and uncertain situations over policies produced by other state-of-art models. The STeaM produced policies also exhibit bounded performance degradation when tested previously unseen policies. Master of Engineering 2022-06-20T02:50:02Z 2022-06-20T02:50:02Z 2022 Thesis-Master by Research Yan, K. H. (2022). Robust multi-agent team behaviors in uncertain environment via reinforcement learning. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/159448 https://hdl.handle.net/10356/159448 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Yan, Kok Hong
Robust multi-agent team behaviors in uncertain environment via reinforcement learning
title Robust multi-agent team behaviors in uncertain environment via reinforcement learning
title_full Robust multi-agent team behaviors in uncertain environment via reinforcement learning
title_fullStr Robust multi-agent team behaviors in uncertain environment via reinforcement learning
title_full_unstemmed Robust multi-agent team behaviors in uncertain environment via reinforcement learning
title_short Robust multi-agent team behaviors in uncertain environment via reinforcement learning
title_sort robust multi agent team behaviors in uncertain environment via reinforcement learning
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
url https://hdl.handle.net/10356/159448
work_keys_str_mv AT yankokhong robustmultiagentteambehaviorsinuncertainenvironmentviareinforcementlearning