Learning Effective and Human-like Policies for Strategic, Multi-Agent Games

We consider the task of building effective but human-like policies in multi-agent decision-making problems. Imitation learning (IL) is effective at predicting human actions but may not match the strength of expert humans, while reinforcement learning (RL) and search algorithms lead to strong perform...

Full description

Bibliographic Details
Main Author:	Jacob, Athul Paul
Other Authors:	Brown, Noam
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/144569

_version_	1826217021824565248
author	Jacob, Athul Paul
author2	Brown, Noam
author_facet	Brown, Noam Jacob, Athul Paul
author_sort	Jacob, Athul Paul
collection	MIT
description	We consider the task of building effective but human-like policies in multi-agent decision-making problems. Imitation learning (IL) is effective at predicting human actions but may not match the strength of expert humans, while reinforcement learning (RL) and search algorithms lead to strong performance but may produce policies that are difficult for humans to understand and coordinate with. We first study the problem of producing human-like communication in latent language policies (LLPs), in which high-level instructor and low-level executor agents communicate using natural language. While LLPs can solve long-horizon RL problems, past work has found that LLP training produces agents that use messages in ways inconsistent with their natural language meanings. We introduce a sample-efficient multitask training scheme that yields human-like communication in a complex realtime strategy game. We then turn to the problem of producing human-like decision-making in a more general class of policies. We develop a regret-minimization algorithm for imperfect information games that can leverage human demonstrations. We show that using this algorithm for search in no-press Diplomacy yields a policy that matches the human-likeness of IL while achieving much higher reward. ___________ This thesis is based on the papers, Multitasking Inhibits Semantic Drift published at NAACL 2021 and Modeling Strong and Human-Like Gameplay with KL-Regularized Search which is currently under review for publication at ICML 2022. The contents of this paper are used with the permission of co-authors David J. Wu, Gabriele Farina, Adam Lerer, Hengyuan Hu, Anton Bakhtin, Mike Lewis, Noam Brown, and Jacob Andreas.
first_indexed	2024-09-23T16:56:49Z
format	Thesis
id	mit-1721.1/144569
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T16:56:49Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1445692022-08-30T03:01:39Z Learning Effective and Human-like Policies for Strategic, Multi-Agent Games Jacob, Athul Paul Brown, Noam Andreas, Jacob Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science We consider the task of building effective but human-like policies in multi-agent decision-making problems. Imitation learning (IL) is effective at predicting human actions but may not match the strength of expert humans, while reinforcement learning (RL) and search algorithms lead to strong performance but may produce policies that are difficult for humans to understand and coordinate with. We first study the problem of producing human-like communication in latent language policies (LLPs), in which high-level instructor and low-level executor agents communicate using natural language. While LLPs can solve long-horizon RL problems, past work has found that LLP training produces agents that use messages in ways inconsistent with their natural language meanings. We introduce a sample-efficient multitask training scheme that yields human-like communication in a complex realtime strategy game. We then turn to the problem of producing human-like decision-making in a more general class of policies. We develop a regret-minimization algorithm for imperfect information games that can leverage human demonstrations. We show that using this algorithm for search in no-press Diplomacy yields a policy that matches the human-likeness of IL while achieving much higher reward. ___________ This thesis is based on the papers, Multitasking Inhibits Semantic Drift published at NAACL 2021 and Modeling Strong and Human-Like Gameplay with KL-Regularized Search which is currently under review for publication at ICML 2022. The contents of this paper are used with the permission of co-authors David J. Wu, Gabriele Farina, Adam Lerer, Hengyuan Hu, Anton Bakhtin, Mike Lewis, Noam Brown, and Jacob Andreas. S.M. 2022-08-29T15:56:29Z 2022-08-29T15:56:29Z 2022-05 2022-06-21T19:25:47.554Z Thesis https://hdl.handle.net/1721.1/144569 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Jacob, Athul Paul Learning Effective and Human-like Policies for Strategic, Multi-Agent Games
title	Learning Effective and Human-like Policies for Strategic, Multi-Agent Games
title_full	Learning Effective and Human-like Policies for Strategic, Multi-Agent Games
title_fullStr	Learning Effective and Human-like Policies for Strategic, Multi-Agent Games
title_full_unstemmed	Learning Effective and Human-like Policies for Strategic, Multi-Agent Games
title_short	Learning Effective and Human-like Policies for Strategic, Multi-Agent Games
title_sort	learning effective and human like policies for strategic multi agent games
url	https://hdl.handle.net/1721.1/144569
work_keys_str_mv	AT jacobathulpaul learningeffectiveandhumanlikepoliciesforstrategicmultiagentgames

Learning Effective and Human-like Policies for Strategic, Multi-Agent Games

Similar Items