Monotonic value function factorisation for deep multi-agent reinforcement learning
In many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning join...
Main Authors: | , , , , , |
---|---|
Format: | Journal article |
Language: | English |
Published: |
Journal of Machine Learning Research
2020
|
_version_ | 1826274676089815040 |
---|---|
author | Rashid, T Samvelyan, M Schröder de Witt, C Farquhar, G Foerster, JN Whiteson, S |
author_facet | Rashid, T Samvelyan, M Schröder de Witt, C Farquhar, G Foerster, JN Whiteson, S |
author_sort | Rashid, T |
collection | OXFORD |
description | In many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a mixing network that estimates joint action-values as a monotonic combination of per-agent values. We structurally enforce that the joint-action value is monotonic in the per-agent values, through the use of non-negative weights in the mixing network, which guarantees consistency between the centralised and decentralised policies. To evaluate the performance of QMIX, we propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning. We evaluate QMIX on a challenging set of SMAC scenarios and show that it significantly outperforms existing multi-agent reinforcement learning methods. |
first_indexed | 2024-03-06T22:47:05Z |
format | Journal article |
id | oxford-uuid:5d89843d-b484-4da8-875e-071fe69afaae |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-06T22:47:05Z |
publishDate | 2020 |
publisher | Journal of Machine Learning Research |
record_format | dspace |
spelling | oxford-uuid:5d89843d-b484-4da8-875e-071fe69afaae2022-03-26T17:35:06ZMonotonic value function factorisation for deep multi-agent reinforcement learningJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:5d89843d-b484-4da8-875e-071fe69afaaeEnglishSymplectic ElementsJournal of Machine Learning Research2020Rashid, TSamvelyan, MSchröder de Witt, CFarquhar, GFoerster, JNWhiteson, SIn many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a mixing network that estimates joint action-values as a monotonic combination of per-agent values. We structurally enforce that the joint-action value is monotonic in the per-agent values, through the use of non-negative weights in the mixing network, which guarantees consistency between the centralised and decentralised policies. To evaluate the performance of QMIX, we propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning. We evaluate QMIX on a challenging set of SMAC scenarios and show that it significantly outperforms existing multi-agent reinforcement learning methods. |
spellingShingle | Rashid, T Samvelyan, M Schröder de Witt, C Farquhar, G Foerster, JN Whiteson, S Monotonic value function factorisation for deep multi-agent reinforcement learning |
title | Monotonic value function factorisation for deep multi-agent reinforcement learning |
title_full | Monotonic value function factorisation for deep multi-agent reinforcement learning |
title_fullStr | Monotonic value function factorisation for deep multi-agent reinforcement learning |
title_full_unstemmed | Monotonic value function factorisation for deep multi-agent reinforcement learning |
title_short | Monotonic value function factorisation for deep multi-agent reinforcement learning |
title_sort | monotonic value function factorisation for deep multi agent reinforcement learning |
work_keys_str_mv | AT rashidt monotonicvaluefunctionfactorisationfordeepmultiagentreinforcementlearning AT samvelyanm monotonicvaluefunctionfactorisationfordeepmultiagentreinforcementlearning AT schroderdewittc monotonicvaluefunctionfactorisationfordeepmultiagentreinforcementlearning AT farquharg monotonicvaluefunctionfactorisationfordeepmultiagentreinforcementlearning AT foersterjn monotonicvaluefunctionfactorisationfordeepmultiagentreinforcementlearning AT whitesons monotonicvaluefunctionfactorisationfordeepmultiagentreinforcementlearning |