Curriculum Reinforcement Learning Based on K-Fold Cross Validation

With the continuous development of deep reinforcement learning in intelligent control, combining automatic curriculum learning and deep reinforcement learning can improve the training performance and efficiency of algorithms from easy to difficult. Most existing automatic curriculum learning algorit...

Full description

Bibliographic Details
Main Authors: Zeyang Lin, Jun Lai, Xiliang Chen, Lei Cao, Jun Wang
Format: Article
Language:English
Published: MDPI AG 2022-12-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/24/12/1787
_version_ 1797459232776257536
author Zeyang Lin
Jun Lai
Xiliang Chen
Lei Cao
Jun Wang
author_facet Zeyang Lin
Jun Lai
Xiliang Chen
Lei Cao
Jun Wang
author_sort Zeyang Lin
collection DOAJ
description With the continuous development of deep reinforcement learning in intelligent control, combining automatic curriculum learning and deep reinforcement learning can improve the training performance and efficiency of algorithms from easy to difficult. Most existing automatic curriculum learning algorithms perform curriculum ranking through expert experience and a single network, which has the problems of difficult curriculum task ranking and slow convergence speed. In this paper, we propose a curriculum reinforcement learning method based on K-Fold Cross Validation that can estimate the relativity score of task curriculum difficulty. Drawing lessons from the human concept of curriculum learning from easy to difficult, this method divides automatic curriculum learning into a curriculum difficulty assessment stage and a curriculum sorting stage. Through parallel training of the teacher model and cross-evaluation of task sample difficulty, the method can better sequence curriculum learning tasks. Finally, simulation comparison experiments were carried out in two types of multi-agent experimental environments. The experimental results show that the automatic curriculum learning method based on K-Fold cross-validation can improve the training speed of the MADDPG algorithm, and at the same time has a certain generality for multi-agent deep reinforcement learning algorithm based on the replay buffer mechanism.
first_indexed 2024-03-09T16:48:30Z
format Article
id doaj.art-064b3677fd734f60bf2dce501af09aab
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-03-09T16:48:30Z
publishDate 2022-12-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-064b3677fd734f60bf2dce501af09aab2023-11-24T14:42:56ZengMDPI AGEntropy1099-43002022-12-012412178710.3390/e24121787Curriculum Reinforcement Learning Based on K-Fold Cross ValidationZeyang Lin0Jun Lai1Xiliang Chen2Lei Cao3Jun Wang4Command & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, ChinaCommand & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, ChinaCommand & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, ChinaCommand & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, ChinaCommand & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, ChinaWith the continuous development of deep reinforcement learning in intelligent control, combining automatic curriculum learning and deep reinforcement learning can improve the training performance and efficiency of algorithms from easy to difficult. Most existing automatic curriculum learning algorithms perform curriculum ranking through expert experience and a single network, which has the problems of difficult curriculum task ranking and slow convergence speed. In this paper, we propose a curriculum reinforcement learning method based on K-Fold Cross Validation that can estimate the relativity score of task curriculum difficulty. Drawing lessons from the human concept of curriculum learning from easy to difficult, this method divides automatic curriculum learning into a curriculum difficulty assessment stage and a curriculum sorting stage. Through parallel training of the teacher model and cross-evaluation of task sample difficulty, the method can better sequence curriculum learning tasks. Finally, simulation comparison experiments were carried out in two types of multi-agent experimental environments. The experimental results show that the automatic curriculum learning method based on K-Fold cross-validation can improve the training speed of the MADDPG algorithm, and at the same time has a certain generality for multi-agent deep reinforcement learning algorithm based on the replay buffer mechanism.https://www.mdpi.com/1099-4300/24/12/1787deep reinforcement learningautomatic curriculum learningK-fold cross validationreplay buffer
spellingShingle Zeyang Lin
Jun Lai
Xiliang Chen
Lei Cao
Jun Wang
Curriculum Reinforcement Learning Based on K-Fold Cross Validation
Entropy
deep reinforcement learning
automatic curriculum learning
K-fold cross validation
replay buffer
title Curriculum Reinforcement Learning Based on K-Fold Cross Validation
title_full Curriculum Reinforcement Learning Based on K-Fold Cross Validation
title_fullStr Curriculum Reinforcement Learning Based on K-Fold Cross Validation
title_full_unstemmed Curriculum Reinforcement Learning Based on K-Fold Cross Validation
title_short Curriculum Reinforcement Learning Based on K-Fold Cross Validation
title_sort curriculum reinforcement learning based on k fold cross validation
topic deep reinforcement learning
automatic curriculum learning
K-fold cross validation
replay buffer
url https://www.mdpi.com/1099-4300/24/12/1787
work_keys_str_mv AT zeyanglin curriculumreinforcementlearningbasedonkfoldcrossvalidation
AT junlai curriculumreinforcementlearningbasedonkfoldcrossvalidation
AT xiliangchen curriculumreinforcementlearningbasedonkfoldcrossvalidation
AT leicao curriculumreinforcementlearningbasedonkfoldcrossvalidation
AT junwang curriculumreinforcementlearningbasedonkfoldcrossvalidation