Multi-Robot Flocking Control Based on Deep Reinforcement Learning
In this paper, we apply deep reinforcement learning (DRL) to solve the flocking control problem of multi-robot systems in complex environments with dynamic obstacles. Starting from the traditional flocking model, we propose a DRL framework for implementing multi-robot flocking control, eliminating t...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9169650/ |
_version_ | 1818431312240836608 |
---|---|
author | Pengming Zhu Wei Dai Weijia Yao Junchong Ma Zhiwen Zeng Huimin Lu |
author_facet | Pengming Zhu Wei Dai Weijia Yao Junchong Ma Zhiwen Zeng Huimin Lu |
author_sort | Pengming Zhu |
collection | DOAJ |
description | In this paper, we apply deep reinforcement learning (DRL) to solve the flocking control problem of multi-robot systems in complex environments with dynamic obstacles. Starting from the traditional flocking model, we propose a DRL framework for implementing multi-robot flocking control, eliminating the tedious work of modeling and control designing. We adopt the multi-agent deep deterministic policy gradient (MADDPG) algorithm, which additionally uses the information of multiple robots in the learning process to better predict the actions that robots will take. To address the problems such as low learning efficiency and slow convergence speed of the MADDPG algorithm, this paper studies a prioritized experience replay (PER) mechanism and proposes the Prioritized Experience Replay-MADDPG (PER-MADDPG) algorithm. Based on the temporal difference (TD) error, a priority evaluation function is designed to determine which experiences are sampled preferentially from the replay buffer. In the end, the simulation results verify the effectiveness of the proposed algorithm. It has a faster convergence speed and enables the robot group to complete the flocking task in the environment with obstacles. |
first_indexed | 2024-12-14T15:47:18Z |
format | Article |
id | doaj.art-17bec4a3249d473284b09a407ffc4828 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-14T15:47:18Z |
publishDate | 2020-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-17bec4a3249d473284b09a407ffc48282022-12-21T22:55:28ZengIEEEIEEE Access2169-35362020-01-01815039715040610.1109/ACCESS.2020.30169519169650Multi-Robot Flocking Control Based on Deep Reinforcement LearningPengming Zhu0https://orcid.org/0000-0002-2440-5331Wei Dai1Weijia Yao2https://orcid.org/0000-0003-0361-6620Junchong Ma3Zhiwen Zeng4Huimin Lu5https://orcid.org/0000-0002-6375-581XRobotics Research Center, College of Intelligence Science and Technology, National University of Defense Technology, Changsha, ChinaRobotics Research Center, College of Intelligence Science and Technology, National University of Defense Technology, Changsha, ChinaRobotics Research Center, College of Intelligence Science and Technology, National University of Defense Technology, Changsha, ChinaRobotics Research Center, College of Intelligence Science and Technology, National University of Defense Technology, Changsha, ChinaRobotics Research Center, College of Intelligence Science and Technology, National University of Defense Technology, Changsha, ChinaRobotics Research Center, College of Intelligence Science and Technology, National University of Defense Technology, Changsha, ChinaIn this paper, we apply deep reinforcement learning (DRL) to solve the flocking control problem of multi-robot systems in complex environments with dynamic obstacles. Starting from the traditional flocking model, we propose a DRL framework for implementing multi-robot flocking control, eliminating the tedious work of modeling and control designing. We adopt the multi-agent deep deterministic policy gradient (MADDPG) algorithm, which additionally uses the information of multiple robots in the learning process to better predict the actions that robots will take. To address the problems such as low learning efficiency and slow convergence speed of the MADDPG algorithm, this paper studies a prioritized experience replay (PER) mechanism and proposes the Prioritized Experience Replay-MADDPG (PER-MADDPG) algorithm. Based on the temporal difference (TD) error, a priority evaluation function is designed to determine which experiences are sampled preferentially from the replay buffer. In the end, the simulation results verify the effectiveness of the proposed algorithm. It has a faster convergence speed and enables the robot group to complete the flocking task in the environment with obstacles.https://ieeexplore.ieee.org/document/9169650/Multi-robotdeep reinforcement learningflocking controlPER-MADDPG |
spellingShingle | Pengming Zhu Wei Dai Weijia Yao Junchong Ma Zhiwen Zeng Huimin Lu Multi-Robot Flocking Control Based on Deep Reinforcement Learning IEEE Access Multi-robot deep reinforcement learning flocking control PER-MADDPG |
title | Multi-Robot Flocking Control Based on Deep Reinforcement Learning |
title_full | Multi-Robot Flocking Control Based on Deep Reinforcement Learning |
title_fullStr | Multi-Robot Flocking Control Based on Deep Reinforcement Learning |
title_full_unstemmed | Multi-Robot Flocking Control Based on Deep Reinforcement Learning |
title_short | Multi-Robot Flocking Control Based on Deep Reinforcement Learning |
title_sort | multi robot flocking control based on deep reinforcement learning |
topic | Multi-robot deep reinforcement learning flocking control PER-MADDPG |
url | https://ieeexplore.ieee.org/document/9169650/ |
work_keys_str_mv | AT pengmingzhu multirobotflockingcontrolbasedondeepreinforcementlearning AT weidai multirobotflockingcontrolbasedondeepreinforcementlearning AT weijiayao multirobotflockingcontrolbasedondeepreinforcementlearning AT junchongma multirobotflockingcontrolbasedondeepreinforcementlearning AT zhiwenzeng multirobotflockingcontrolbasedondeepreinforcementlearning AT huiminlu multirobotflockingcontrolbasedondeepreinforcementlearning |