Decentralized multi-agent reinforcement learning based on best-response policies

Introduction: Multi-agent systems are an interdisciplinary research field that describes the concept of multiple decisive individuals interacting with a usually partially observable environment. Given the recent advances in single-agent reinforcement learning, multi-agent reinforcement learning (RL)...

Full description

Bibliographic Details
Main Authors:	Volker Gabler, Dirk Wollherr
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2024-04-01
Series:	Frontiers in Robotics and AI
Subjects:	multi-agent reinforcement learning game theory deep learning, artificial intelligence actor–critic algorithm multi-agent Stackelberg
Online Access:	https://www.frontiersin.org/articles/10.3389/frobt.2024.1229026/full

_version_	1797205311699812352
author	Volker Gabler Dirk Wollherr
author_facet	Volker Gabler Dirk Wollherr
author_sort	Volker Gabler
collection	DOAJ
description	Introduction: Multi-agent systems are an interdisciplinary research field that describes the concept of multiple decisive individuals interacting with a usually partially observable environment. Given the recent advances in single-agent reinforcement learning, multi-agent reinforcement learning (RL) has gained tremendous interest in recent years. Most research studies apply a fully centralized learning scheme to ease the transfer from the single-agent domain to multi-agent systems.Methods: In contrast, we claim that a decentralized learning scheme is preferable for applications in real-world scenarios as this allows deploying a learning algorithm on an individual robot rather than deploying the algorithm to a complete fleet of robots. Therefore, this article outlines a novel actor–critic (AC) approach tailored to cooperative MARL problems in sparsely rewarded domains. Our approach decouples the MARL problem into a set of distributed agents that model the other agents as responsive entities. In particular, we propose using two separate critics per agent to distinguish between the joint task reward and agent-based costs as commonly applied within multi-robot planning. On one hand, the agent-based critic intends to decrease agent-specific costs. On the other hand, each agent intends to optimize the joint team reward based on the joint task critic. As this critic still depends on the joint action of all agents, we outline two suitable behavior models based on Stackelberg games: a game against nature and a dyadic game against each agent. Following these behavior models, our algorithm allows fully decentralized execution and training.Results and Discussion: We evaluate our presented method using the proposed behavior models within a sparsely rewarded simulated multi-agent environment. Although our approach already outperforms the state-of-the-art learners, we conclude this article by outlining possible extensions of our algorithm that future research may build upon.
first_indexed	2024-04-24T08:49:07Z
format	Article
id	doaj.art-b4498e6898414c9fb850b713d7b40157
institution	Directory Open Access Journal
issn	2296-9144
language	English
last_indexed	2024-04-24T08:49:07Z
publishDate	2024-04-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Robotics and AI
spelling	doaj.art-b4498e6898414c9fb850b713d7b401572024-04-16T12:50:39ZengFrontiers Media S.A.Frontiers in Robotics and AI2296-91442024-04-011110.3389/frobt.2024.12290261229026Decentralized multi-agent reinforcement learning based on best-response policiesVolker GablerDirk WollherrIntroduction: Multi-agent systems are an interdisciplinary research field that describes the concept of multiple decisive individuals interacting with a usually partially observable environment. Given the recent advances in single-agent reinforcement learning, multi-agent reinforcement learning (RL) has gained tremendous interest in recent years. Most research studies apply a fully centralized learning scheme to ease the transfer from the single-agent domain to multi-agent systems.Methods: In contrast, we claim that a decentralized learning scheme is preferable for applications in real-world scenarios as this allows deploying a learning algorithm on an individual robot rather than deploying the algorithm to a complete fleet of robots. Therefore, this article outlines a novel actor–critic (AC) approach tailored to cooperative MARL problems in sparsely rewarded domains. Our approach decouples the MARL problem into a set of distributed agents that model the other agents as responsive entities. In particular, we propose using two separate critics per agent to distinguish between the joint task reward and agent-based costs as commonly applied within multi-robot planning. On one hand, the agent-based critic intends to decrease agent-specific costs. On the other hand, each agent intends to optimize the joint team reward based on the joint task critic. As this critic still depends on the joint action of all agents, we outline two suitable behavior models based on Stackelberg games: a game against nature and a dyadic game against each agent. Following these behavior models, our algorithm allows fully decentralized execution and training.Results and Discussion: We evaluate our presented method using the proposed behavior models within a sparsely rewarded simulated multi-agent environment. Although our approach already outperforms the state-of-the-art learners, we conclude this article by outlining possible extensions of our algorithm that future research may build upon.https://www.frontiersin.org/articles/10.3389/frobt.2024.1229026/fullmulti-agent reinforcement learninggame theorydeep learning, artificial intelligenceactor–critic algorithmmulti-agentStackelberg
spellingShingle	Volker Gabler Dirk Wollherr Decentralized multi-agent reinforcement learning based on best-response policies Frontiers in Robotics and AI multi-agent reinforcement learning game theory deep learning, artificial intelligence actor–critic algorithm multi-agent Stackelberg
title	Decentralized multi-agent reinforcement learning based on best-response policies
title_full	Decentralized multi-agent reinforcement learning based on best-response policies
title_fullStr	Decentralized multi-agent reinforcement learning based on best-response policies
title_full_unstemmed	Decentralized multi-agent reinforcement learning based on best-response policies
title_short	Decentralized multi-agent reinforcement learning based on best-response policies
title_sort	decentralized multi agent reinforcement learning based on best response policies
topic	multi-agent reinforcement learning game theory deep learning, artificial intelligence actor–critic algorithm multi-agent Stackelberg
url	https://www.frontiersin.org/articles/10.3389/frobt.2024.1229026/full
work_keys_str_mv	AT volkergabler decentralizedmultiagentreinforcementlearningbasedonbestresponsepolicies AT dirkwollherr decentralizedmultiagentreinforcementlearningbasedonbestresponsepolicies

Decentralized multi-agent reinforcement learning based on best-response policies

Similar Items