A Survey on Population-Based Deep Reinforcement Learning

Many real-world applications can be described as large-scale games of imperfect information, which require extensive prior domain knowledge, especially in competitive or human–AI cooperation settings. Population-based training methods have become a popular solution to learn robust policies without a...

Full description

Bibliographic Details
Main Authors: Weifan Long, Taixian Hou, Xiaoyi Wei, Shichao Yan, Peng Zhai, Lihua Zhang
Format: Article
Language:English
Published: MDPI AG 2023-05-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/11/10/2234
_version_ 1797599187287670784
author Weifan Long
Taixian Hou
Xiaoyi Wei
Shichao Yan
Peng Zhai
Lihua Zhang
author_facet Weifan Long
Taixian Hou
Xiaoyi Wei
Shichao Yan
Peng Zhai
Lihua Zhang
author_sort Weifan Long
collection DOAJ
description Many real-world applications can be described as large-scale games of imperfect information, which require extensive prior domain knowledge, especially in competitive or human–AI cooperation settings. Population-based training methods have become a popular solution to learn robust policies without any prior knowledge, which can generalize to policies of other players or humans. In this survey, we shed light on population-based deep reinforcement learning (PB-DRL) algorithms, their applications, and general frameworks. We introduce several independent subject areas, including naive self-play, fictitious self-play, population-play, evolution-based training methods, and the policy-space response oracle family. These methods provide a variety of approaches to solving multi-agent problems and are useful in designing robust multi-agent reinforcement learning algorithms that can handle complex real-life situations. Finally, we discuss challenges and hot topics in PB-DRL algorithms. We hope that this brief survey can provide guidance and insights for researchers interested in PB-DRL algorithms.
first_indexed 2024-03-11T03:32:12Z
format Article
id doaj.art-4ab3ae5333f0433a8a844b08a024d413
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-11T03:32:12Z
publishDate 2023-05-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-4ab3ae5333f0433a8a844b08a024d4132023-11-18T02:17:59ZengMDPI AGMathematics2227-73902023-05-011110223410.3390/math11102234A Survey on Population-Based Deep Reinforcement LearningWeifan Long0Taixian Hou1Xiaoyi Wei2Shichao Yan3Peng Zhai4Lihua Zhang5Academy for Engineering and Technology, Fudan University, Shanghai 200433, ChinaAcademy for Engineering and Technology, Fudan University, Shanghai 200433, ChinaAcademy for Engineering and Technology, Fudan University, Shanghai 200433, ChinaAcademy for Engineering and Technology, Fudan University, Shanghai 200433, ChinaAcademy for Engineering and Technology, Fudan University, Shanghai 200433, ChinaAcademy for Engineering and Technology, Fudan University, Shanghai 200433, ChinaMany real-world applications can be described as large-scale games of imperfect information, which require extensive prior domain knowledge, especially in competitive or human–AI cooperation settings. Population-based training methods have become a popular solution to learn robust policies without any prior knowledge, which can generalize to policies of other players or humans. In this survey, we shed light on population-based deep reinforcement learning (PB-DRL) algorithms, their applications, and general frameworks. We introduce several independent subject areas, including naive self-play, fictitious self-play, population-play, evolution-based training methods, and the policy-space response oracle family. These methods provide a variety of approaches to solving multi-agent problems and are useful in designing robust multi-agent reinforcement learning algorithms that can handle complex real-life situations. Finally, we discuss challenges and hot topics in PB-DRL algorithms. We hope that this brief survey can provide guidance and insights for researchers interested in PB-DRL algorithms.https://www.mdpi.com/2227-7390/11/10/2234reinforcement learningmulti-agent reinforcement learningself playpopulation play
spellingShingle Weifan Long
Taixian Hou
Xiaoyi Wei
Shichao Yan
Peng Zhai
Lihua Zhang
A Survey on Population-Based Deep Reinforcement Learning
Mathematics
reinforcement learning
multi-agent reinforcement learning
self play
population play
title A Survey on Population-Based Deep Reinforcement Learning
title_full A Survey on Population-Based Deep Reinforcement Learning
title_fullStr A Survey on Population-Based Deep Reinforcement Learning
title_full_unstemmed A Survey on Population-Based Deep Reinforcement Learning
title_short A Survey on Population-Based Deep Reinforcement Learning
title_sort survey on population based deep reinforcement learning
topic reinforcement learning
multi-agent reinforcement learning
self play
population play
url https://www.mdpi.com/2227-7390/11/10/2234
work_keys_str_mv AT weifanlong asurveyonpopulationbaseddeepreinforcementlearning
AT taixianhou asurveyonpopulationbaseddeepreinforcementlearning
AT xiaoyiwei asurveyonpopulationbaseddeepreinforcementlearning
AT shichaoyan asurveyonpopulationbaseddeepreinforcementlearning
AT pengzhai asurveyonpopulationbaseddeepreinforcementlearning
AT lihuazhang asurveyonpopulationbaseddeepreinforcementlearning
AT weifanlong surveyonpopulationbaseddeepreinforcementlearning
AT taixianhou surveyonpopulationbaseddeepreinforcementlearning
AT xiaoyiwei surveyonpopulationbaseddeepreinforcementlearning
AT shichaoyan surveyonpopulationbaseddeepreinforcementlearning
AT pengzhai surveyonpopulationbaseddeepreinforcementlearning
AT lihuazhang surveyonpopulationbaseddeepreinforcementlearning