Decentralized multi-agent reinforcement learning based on best-response policies

Introduction: Multi-agent systems are an interdisciplinary research field that describes the concept of multiple decisive individuals interacting with a usually partially observable environment. Given the recent advances in single-agent reinforcement learning, multi-agent reinforcement learning (RL)...

Full description

Bibliographic Details
Main Authors: Volker Gabler, Dirk Wollherr
Format: Article
Language:English
Published: Frontiers Media S.A. 2024-04-01
Series:Frontiers in Robotics and AI
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/frobt.2024.1229026/full
_version_ 1797205311699812352
author Volker Gabler
Dirk Wollherr
author_facet Volker Gabler
Dirk Wollherr
author_sort Volker Gabler
collection DOAJ
description Introduction: Multi-agent systems are an interdisciplinary research field that describes the concept of multiple decisive individuals interacting with a usually partially observable environment. Given the recent advances in single-agent reinforcement learning, multi-agent reinforcement learning (RL) has gained tremendous interest in recent years. Most research studies apply a fully centralized learning scheme to ease the transfer from the single-agent domain to multi-agent systems.Methods: In contrast, we claim that a decentralized learning scheme is preferable for applications in real-world scenarios as this allows deploying a learning algorithm on an individual robot rather than deploying the algorithm to a complete fleet of robots. Therefore, this article outlines a novel actor–critic (AC) approach tailored to cooperative MARL problems in sparsely rewarded domains. Our approach decouples the MARL problem into a set of distributed agents that model the other agents as responsive entities. In particular, we propose using two separate critics per agent to distinguish between the joint task reward and agent-based costs as commonly applied within multi-robot planning. On one hand, the agent-based critic intends to decrease agent-specific costs. On the other hand, each agent intends to optimize the joint team reward based on the joint task critic. As this critic still depends on the joint action of all agents, we outline two suitable behavior models based on Stackelberg games: a game against nature and a dyadic game against each agent. Following these behavior models, our algorithm allows fully decentralized execution and training.Results and Discussion: We evaluate our presented method using the proposed behavior models within a sparsely rewarded simulated multi-agent environment. Although our approach already outperforms the state-of-the-art learners, we conclude this article by outlining possible extensions of our algorithm that future research may build upon.
first_indexed 2024-04-24T08:49:07Z
format Article
id doaj.art-b4498e6898414c9fb850b713d7b40157
institution Directory Open Access Journal
issn 2296-9144
language English
last_indexed 2024-04-24T08:49:07Z
publishDate 2024-04-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Robotics and AI
spelling doaj.art-b4498e6898414c9fb850b713d7b401572024-04-16T12:50:39ZengFrontiers Media S.A.Frontiers in Robotics and AI2296-91442024-04-011110.3389/frobt.2024.12290261229026Decentralized multi-agent reinforcement learning based on best-response policiesVolker GablerDirk WollherrIntroduction: Multi-agent systems are an interdisciplinary research field that describes the concept of multiple decisive individuals interacting with a usually partially observable environment. Given the recent advances in single-agent reinforcement learning, multi-agent reinforcement learning (RL) has gained tremendous interest in recent years. Most research studies apply a fully centralized learning scheme to ease the transfer from the single-agent domain to multi-agent systems.Methods: In contrast, we claim that a decentralized learning scheme is preferable for applications in real-world scenarios as this allows deploying a learning algorithm on an individual robot rather than deploying the algorithm to a complete fleet of robots. Therefore, this article outlines a novel actor–critic (AC) approach tailored to cooperative MARL problems in sparsely rewarded domains. Our approach decouples the MARL problem into a set of distributed agents that model the other agents as responsive entities. In particular, we propose using two separate critics per agent to distinguish between the joint task reward and agent-based costs as commonly applied within multi-robot planning. On one hand, the agent-based critic intends to decrease agent-specific costs. On the other hand, each agent intends to optimize the joint team reward based on the joint task critic. As this critic still depends on the joint action of all agents, we outline two suitable behavior models based on Stackelberg games: a game against nature and a dyadic game against each agent. Following these behavior models, our algorithm allows fully decentralized execution and training.Results and Discussion: We evaluate our presented method using the proposed behavior models within a sparsely rewarded simulated multi-agent environment. Although our approach already outperforms the state-of-the-art learners, we conclude this article by outlining possible extensions of our algorithm that future research may build upon.https://www.frontiersin.org/articles/10.3389/frobt.2024.1229026/fullmulti-agent reinforcement learninggame theorydeep learning, artificial intelligenceactor–critic algorithmmulti-agentStackelberg
spellingShingle Volker Gabler
Dirk Wollherr
Decentralized multi-agent reinforcement learning based on best-response policies
Frontiers in Robotics and AI
multi-agent reinforcement learning
game theory
deep learning, artificial intelligence
actor–critic algorithm
multi-agent
Stackelberg
title Decentralized multi-agent reinforcement learning based on best-response policies
title_full Decentralized multi-agent reinforcement learning based on best-response policies
title_fullStr Decentralized multi-agent reinforcement learning based on best-response policies
title_full_unstemmed Decentralized multi-agent reinforcement learning based on best-response policies
title_short Decentralized multi-agent reinforcement learning based on best-response policies
title_sort decentralized multi agent reinforcement learning based on best response policies
topic multi-agent reinforcement learning
game theory
deep learning, artificial intelligence
actor–critic algorithm
multi-agent
Stackelberg
url https://www.frontiersin.org/articles/10.3389/frobt.2024.1229026/full
work_keys_str_mv AT volkergabler decentralizedmultiagentreinforcementlearningbasedonbestresponsepolicies
AT dirkwollherr decentralizedmultiagentreinforcementlearningbasedonbestresponsepolicies