Curriculum Reinforcement Learning Based on K-Fold Cross Validation
With the continuous development of deep reinforcement learning in intelligent control, combining automatic curriculum learning and deep reinforcement learning can improve the training performance and efficiency of algorithms from easy to difficult. Most existing automatic curriculum learning algorit...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-12-01
|
Series: | Entropy |
Subjects: | |
Online Access: | https://www.mdpi.com/1099-4300/24/12/1787 |
_version_ | 1797459232776257536 |
---|---|
author | Zeyang Lin Jun Lai Xiliang Chen Lei Cao Jun Wang |
author_facet | Zeyang Lin Jun Lai Xiliang Chen Lei Cao Jun Wang |
author_sort | Zeyang Lin |
collection | DOAJ |
description | With the continuous development of deep reinforcement learning in intelligent control, combining automatic curriculum learning and deep reinforcement learning can improve the training performance and efficiency of algorithms from easy to difficult. Most existing automatic curriculum learning algorithms perform curriculum ranking through expert experience and a single network, which has the problems of difficult curriculum task ranking and slow convergence speed. In this paper, we propose a curriculum reinforcement learning method based on K-Fold Cross Validation that can estimate the relativity score of task curriculum difficulty. Drawing lessons from the human concept of curriculum learning from easy to difficult, this method divides automatic curriculum learning into a curriculum difficulty assessment stage and a curriculum sorting stage. Through parallel training of the teacher model and cross-evaluation of task sample difficulty, the method can better sequence curriculum learning tasks. Finally, simulation comparison experiments were carried out in two types of multi-agent experimental environments. The experimental results show that the automatic curriculum learning method based on K-Fold cross-validation can improve the training speed of the MADDPG algorithm, and at the same time has a certain generality for multi-agent deep reinforcement learning algorithm based on the replay buffer mechanism. |
first_indexed | 2024-03-09T16:48:30Z |
format | Article |
id | doaj.art-064b3677fd734f60bf2dce501af09aab |
institution | Directory Open Access Journal |
issn | 1099-4300 |
language | English |
last_indexed | 2024-03-09T16:48:30Z |
publishDate | 2022-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Entropy |
spelling | doaj.art-064b3677fd734f60bf2dce501af09aab2023-11-24T14:42:56ZengMDPI AGEntropy1099-43002022-12-012412178710.3390/e24121787Curriculum Reinforcement Learning Based on K-Fold Cross ValidationZeyang Lin0Jun Lai1Xiliang Chen2Lei Cao3Jun Wang4Command & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, ChinaCommand & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, ChinaCommand & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, ChinaCommand & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, ChinaCommand & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, ChinaWith the continuous development of deep reinforcement learning in intelligent control, combining automatic curriculum learning and deep reinforcement learning can improve the training performance and efficiency of algorithms from easy to difficult. Most existing automatic curriculum learning algorithms perform curriculum ranking through expert experience and a single network, which has the problems of difficult curriculum task ranking and slow convergence speed. In this paper, we propose a curriculum reinforcement learning method based on K-Fold Cross Validation that can estimate the relativity score of task curriculum difficulty. Drawing lessons from the human concept of curriculum learning from easy to difficult, this method divides automatic curriculum learning into a curriculum difficulty assessment stage and a curriculum sorting stage. Through parallel training of the teacher model and cross-evaluation of task sample difficulty, the method can better sequence curriculum learning tasks. Finally, simulation comparison experiments were carried out in two types of multi-agent experimental environments. The experimental results show that the automatic curriculum learning method based on K-Fold cross-validation can improve the training speed of the MADDPG algorithm, and at the same time has a certain generality for multi-agent deep reinforcement learning algorithm based on the replay buffer mechanism.https://www.mdpi.com/1099-4300/24/12/1787deep reinforcement learningautomatic curriculum learningK-fold cross validationreplay buffer |
spellingShingle | Zeyang Lin Jun Lai Xiliang Chen Lei Cao Jun Wang Curriculum Reinforcement Learning Based on K-Fold Cross Validation Entropy deep reinforcement learning automatic curriculum learning K-fold cross validation replay buffer |
title | Curriculum Reinforcement Learning Based on K-Fold Cross Validation |
title_full | Curriculum Reinforcement Learning Based on K-Fold Cross Validation |
title_fullStr | Curriculum Reinforcement Learning Based on K-Fold Cross Validation |
title_full_unstemmed | Curriculum Reinforcement Learning Based on K-Fold Cross Validation |
title_short | Curriculum Reinforcement Learning Based on K-Fold Cross Validation |
title_sort | curriculum reinforcement learning based on k fold cross validation |
topic | deep reinforcement learning automatic curriculum learning K-fold cross validation replay buffer |
url | https://www.mdpi.com/1099-4300/24/12/1787 |
work_keys_str_mv | AT zeyanglin curriculumreinforcementlearningbasedonkfoldcrossvalidation AT junlai curriculumreinforcementlearningbasedonkfoldcrossvalidation AT xiliangchen curriculumreinforcementlearningbasedonkfoldcrossvalidation AT leicao curriculumreinforcementlearningbasedonkfoldcrossvalidation AT junwang curriculumreinforcementlearningbasedonkfoldcrossvalidation |