Multi-Agent Distributed Deep Deterministic Policy Gradient for Partially Observable Tracking

In many existing multi-agent reinforcement learning tasks, each agent observes all the other agents from its own perspective. In addition, the training process is centralized, namely the critic of each agent can access the policies of all the agents. This scheme has certain limitations since every s...

Full description

Bibliographic Details
Main Authors:	Dongyu Fan, Haikuo Shen, Lijing Dong
Format:	Article
Language:	English
Published:	MDPI AG 2021-10-01
Series:	Actuators
Subjects:	multi-agent systems deep reinforcement learning actor–critic partial observability
Online Access:	https://www.mdpi.com/2076-0825/10/10/268

_version_	1797515616827998208
author	Dongyu Fan Haikuo Shen Lijing Dong
author_facet	Dongyu Fan Haikuo Shen Lijing Dong
author_sort	Dongyu Fan
collection	DOAJ
description	In many existing multi-agent reinforcement learning tasks, each agent observes all the other agents from its own perspective. In addition, the training process is centralized, namely the critic of each agent can access the policies of all the agents. This scheme has certain limitations since every single agent can only obtain the information of its neighbor agents due to the communication range in practical applications. Therefore, in this paper, a multi-agent distributed deep deterministic policy gradient (MAD3PG) approach is presented with decentralized actors and distributed critics to realize multi-agent distributed tracking. The distinguishing feature of the proposed framework is that we adopted the multi-agent distributed training with decentralized execution, where each critic only takes the agent’s and the neighbor agents’ policies into account. Experiments were conducted in the distributed tracking tasks based on multi-agent particle environments where <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>N</mi><mspace width="3.33333pt"></mspace><mo>(</mo><mi>N</mi><mo>=</mo><mn>3</mn><mo>,</mo><mi>N</mi><mo>=</mo><mn>5</mn><mo>)</mo></mrow></semantics></math></inline-formula> agents track a target agent with partial observation. The results showed that the proposed method achieves a higher reward with a shorter training time compared to other methods, including MADDPG, DDPG, PPO, and DQN. The proposed novel method leads to a more efficient and effective multi-agent tracking.
first_indexed	2024-03-10T06:47:59Z
format	Article
id	doaj.art-2f87181e28cc436dbafe4a65d2a51829
institution	Directory Open Access Journal
issn	2076-0825
language	English
last_indexed	2024-03-10T06:47:59Z
publishDate	2021-10-01
publisher	MDPI AG
record_format	Article
series	Actuators
spelling	doaj.art-2f87181e28cc436dbafe4a65d2a518292023-11-22T17:03:14ZengMDPI AGActuators2076-08252021-10-01101026810.3390/act10100268Multi-Agent Distributed Deep Deterministic Policy Gradient for Partially Observable TrackingDongyu Fan0Haikuo Shen1Lijing Dong2School of Mechanical, Electronic and Control Engineering, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Mechanical, Electronic and Control Engineering, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Mechanical, Electronic and Control Engineering, Beijing Jiaotong University, Beijing 100044, ChinaIn many existing multi-agent reinforcement learning tasks, each agent observes all the other agents from its own perspective. In addition, the training process is centralized, namely the critic of each agent can access the policies of all the agents. This scheme has certain limitations since every single agent can only obtain the information of its neighbor agents due to the communication range in practical applications. Therefore, in this paper, a multi-agent distributed deep deterministic policy gradient (MAD3PG) approach is presented with decentralized actors and distributed critics to realize multi-agent distributed tracking. The distinguishing feature of the proposed framework is that we adopted the multi-agent distributed training with decentralized execution, where each critic only takes the agent’s and the neighbor agents’ policies into account. Experiments were conducted in the distributed tracking tasks based on multi-agent particle environments where <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>N</mi><mspace width="3.33333pt"></mspace><mo>(</mo><mi>N</mi><mo>=</mo><mn>3</mn><mo>,</mo><mi>N</mi><mo>=</mo><mn>5</mn><mo>)</mo></mrow></semantics></math></inline-formula> agents track a target agent with partial observation. The results showed that the proposed method achieves a higher reward with a shorter training time compared to other methods, including MADDPG, DDPG, PPO, and DQN. The proposed novel method leads to a more efficient and effective multi-agent tracking.https://www.mdpi.com/2076-0825/10/10/268multi-agent systemsdeep reinforcement learningactor–criticpartial observability
spellingShingle	Dongyu Fan Haikuo Shen Lijing Dong Multi-Agent Distributed Deep Deterministic Policy Gradient for Partially Observable Tracking Actuators multi-agent systems deep reinforcement learning actor–critic partial observability
title	Multi-Agent Distributed Deep Deterministic Policy Gradient for Partially Observable Tracking
title_full	Multi-Agent Distributed Deep Deterministic Policy Gradient for Partially Observable Tracking
title_fullStr	Multi-Agent Distributed Deep Deterministic Policy Gradient for Partially Observable Tracking
title_full_unstemmed	Multi-Agent Distributed Deep Deterministic Policy Gradient for Partially Observable Tracking
title_short	Multi-Agent Distributed Deep Deterministic Policy Gradient for Partially Observable Tracking
title_sort	multi agent distributed deep deterministic policy gradient for partially observable tracking
topic	multi-agent systems deep reinforcement learning actor–critic partial observability
url	https://www.mdpi.com/2076-0825/10/10/268
work_keys_str_mv	AT dongyufan multiagentdistributeddeepdeterministicpolicygradientforpartiallyobservabletracking AT haikuoshen multiagentdistributeddeepdeterministicpolicygradientforpartiallyobservabletracking AT lijingdong multiagentdistributeddeepdeterministicpolicygradientforpartiallyobservabletracking

Multi-Agent Distributed Deep Deterministic Policy Gradient for Partially Observable Tracking

Similar Items