Multi-agent deep deterministic policy gradient algorithm for swarm systems

This paper demonstrates the need to develop more suitable decentralized reinforcement learning methods for multi-agents and swarm systems, and consequently explores one such pre-existing algorithm (Multi-Agent Deep Deterministic Policy Gradient - MADDPG) for multi-agent domains and then extends it t...

Full description

Bibliographic Details
Main Author: Bedi, Jannat
Other Authors: Zinovi Rabinovich
Format: Final Year Project (FYP)
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/148106
Description
Summary:This paper demonstrates the need to develop more suitable decentralized reinforcement learning methods for multi-agents and swarm systems, and consequently explores one such pre-existing algorithm (Multi-Agent Deep Deterministic Policy Gradient - MADDPG) for multi-agent domains and then extends it to swarm systems. The paper begins by analyzing the difficulty of traditional algorithms in multi-agent and swarm systems: Q-learning is challenged by an inherent non-stationarity of the environment, while policy gradient suffers from a variance that increases as the number of agents grows. It presents an existing adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination. This adaptation is then extended to swarm systems via swarm parameter tuning and the feasibility of the new algorithm for swarm systems is analysed. The results are discussed and future improvements are suggested. MADDPG can prove to be a good algorithm for swarm systems in the chosen environment on a small scale. However, further studies need to be done to extract the full potential of MADDPG algorithms for swarm systems.