UneVEn: Universal value exploration for multi-agent reinforcement learning

VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized action value function as a monotonic mixing of per-agent utilities. While this enables easy decentralization of the learned policy, the restricted joint action value function can prevent them from solvi...

وصف كامل

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Gupta, T, Mahajan, A, Peng, B, Boehmer, W, Whiteson, S
التنسيق:	Conference item
اللغة:	English
منشور في:	PMLR 2021

UneVEn: Universal value exploration for multi-agent reinforcement learning

مواد مشابهة