UneVEn: Universal value exploration for multi-agent reinforcement learning
VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized action value function as a monotonic mixing of per-agent utilities. While this enables easy decentralization of the learned policy, the restricted joint action value function can prevent them from solvi...
المؤلفون الرئيسيون: | , , , , |
---|---|
التنسيق: | Conference item |
اللغة: | English |
منشور في: |
PMLR
2021
|