Achieving Robustness and Generalization in MARL for Sequential Social Dilemmas through Bilinear Value Networks

This thesis presents a novel approach for training multi-agent reinforcement learning (MARL) agents that are robust to different unforeseen gameplay strategies in sequential social dilemma (SSD) games. Recent literature has demonstrated that reward shaping can not only be used to enable MARL agents...

Full description

Bibliographic Details
Main Author:	Ma, Jeremy
Other Authors:	How, Jonathan P.
Format:	Thesis
Published:	Massachusetts Institute of Technology 2023
Online Access:	https://hdl.handle.net/1721.1/152745

_version_	1826194030481899520
author	Ma, Jeremy
author2	How, Jonathan P.
author_facet	How, Jonathan P. Ma, Jeremy
author_sort	Ma, Jeremy
collection	MIT
description	This thesis presents a novel approach for training multi-agent reinforcement learning (MARL) agents that are robust to different unforeseen gameplay strategies in sequential social dilemma (SSD) games. Recent literature has demonstrated that reward shaping can not only be used to enable MARL agents to discover diverse, human-interpretable strategies with emergent qualities, but also help alleviate the issue in conventional actor-critic methods that tend to converge to suboptimal Nash equilibria in SSD games. However, agents trained through self-play typically converge and overfit to a singular Nash equilibrium. Consequently, these agents are limited to executing the specific strategy they have converged to during training, which renders them ineffective when faced with opponents employing commonly-used strategies such as tit-for-tat. This thesis proposes a method that employs a bilinear value critic that can learn an adaptive and robust strategy in SSD games through self-play with randomized reward sharing. We evaluate the efficacy of this approach on “prisoner’s buddy,” an iterated three-player variant of the prisoner’s dilemma game. Our results show that the bilinear value structure helps the critic generalize over the reward sharing manifold and leads to an adaptive agent with emergent qualities such as reputation. The results of this research highlight the ability of MARL agents to learn a general high-level policy that can effectively socialize with agents with different strategies in SSD games, despite being trained through self-play. The proposed method is scalable and has the potential to be applied to a wide range of multi-agent competitive-cooperative environments, providing insights into the design of MARL algorithms for solving social dilemmas.
first_indexed	2024-09-23T09:49:26Z
format	Thesis
id	mit-1721.1/152745
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T09:49:26Z
publishDate	2023
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1527452023-11-03T03:54:33Z Achieving Robustness and Generalization in MARL for Sequential Social Dilemmas through Bilinear Value Networks Ma, Jeremy How, Jonathan P. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science This thesis presents a novel approach for training multi-agent reinforcement learning (MARL) agents that are robust to different unforeseen gameplay strategies in sequential social dilemma (SSD) games. Recent literature has demonstrated that reward shaping can not only be used to enable MARL agents to discover diverse, human-interpretable strategies with emergent qualities, but also help alleviate the issue in conventional actor-critic methods that tend to converge to suboptimal Nash equilibria in SSD games. However, agents trained through self-play typically converge and overfit to a singular Nash equilibrium. Consequently, these agents are limited to executing the specific strategy they have converged to during training, which renders them ineffective when faced with opponents employing commonly-used strategies such as tit-for-tat. This thesis proposes a method that employs a bilinear value critic that can learn an adaptive and robust strategy in SSD games through self-play with randomized reward sharing. We evaluate the efficacy of this approach on “prisoner’s buddy,” an iterated three-player variant of the prisoner’s dilemma game. Our results show that the bilinear value structure helps the critic generalize over the reward sharing manifold and leads to an adaptive agent with emergent qualities such as reputation. The results of this research highlight the ability of MARL agents to learn a general high-level policy that can effectively socialize with agents with different strategies in SSD games, despite being trained through self-play. The proposed method is scalable and has the potential to be applied to a wide range of multi-agent competitive-cooperative environments, providing insights into the design of MARL algorithms for solving social dilemmas. M.Eng. 2023-11-02T20:12:45Z 2023-11-02T20:12:45Z 2023-09 2023-10-03T18:21:28.799Z Thesis https://hdl.handle.net/1721.1/152745 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Ma, Jeremy Achieving Robustness and Generalization in MARL for Sequential Social Dilemmas through Bilinear Value Networks
title	Achieving Robustness and Generalization in MARL for Sequential Social Dilemmas through Bilinear Value Networks
title_full	Achieving Robustness and Generalization in MARL for Sequential Social Dilemmas through Bilinear Value Networks
title_fullStr	Achieving Robustness and Generalization in MARL for Sequential Social Dilemmas through Bilinear Value Networks
title_full_unstemmed	Achieving Robustness and Generalization in MARL for Sequential Social Dilemmas through Bilinear Value Networks
title_short	Achieving Robustness and Generalization in MARL for Sequential Social Dilemmas through Bilinear Value Networks
title_sort	achieving robustness and generalization in marl for sequential social dilemmas through bilinear value networks
url	https://hdl.handle.net/1721.1/152745
work_keys_str_mv	AT majeremy achievingrobustnessandgeneralizationinmarlforsequentialsocialdilemmasthroughbilinearvaluenetworks

Achieving Robustness and Generalization in MARL for Sequential Social Dilemmas through Bilinear Value Networks

Similar Items