Intelligent trainer for Dyna-style model-based deep reinforcement learning

Model-based reinforcement learning (MBRL) has been proposed as a promising alternative solution to tackle the high sampling cost challenge in the canonical RL, by leveraging a system dynamics model to generate synthetic data for policy training purpose. The MBRL framework, nevertheless, is inherentl...

Full description

Bibliographic Details
Main Authors: Dong, Linsen, Li, Yuanlong, Zhou, Xin, Wen, Yonggang, Guan, Kyle
Other Authors: School of Computer Science and Engineering
Format: Journal Article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10356/159633
_version_ 1811689353067364352
author Dong, Linsen
Li, Yuanlong
Zhou, Xin
Wen, Yonggang
Guan, Kyle
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Dong, Linsen
Li, Yuanlong
Zhou, Xin
Wen, Yonggang
Guan, Kyle
author_sort Dong, Linsen
collection NTU
description Model-based reinforcement learning (MBRL) has been proposed as a promising alternative solution to tackle the high sampling cost challenge in the canonical RL, by leveraging a system dynamics model to generate synthetic data for policy training purpose. The MBRL framework, nevertheless, is inherently limited by the convoluted process of jointly optimizing control policy, learning system dynamics, and sampling data from two sources controlled by complicated hyperparameters. As such, the training process involves overwhelmingly manual tuning and is prohibitively costly. In this research, we propose a "reinforcement on reinforcement" (RoR) architecture to decompose the convoluted tasks into two decoupled layers of RL. The inner layer is the canonical MBRL training process which is formulated as a Markov decision process, called training process environment (TPE). The outer layer serves as an RL agent, called intelligent trainer, to learn an optimal hyperparameter configuration for the inner TPE. This decomposition approach provides much-needed flexibility to implement different trainer designs, referred to "train the trainer." In our research, we propose and optimize two alternative trainer designs: 1) an unihead trainer and 2) a multihead trainer. Our proposed RoR framework is evaluated for five tasks in the OpenAI gym. Compared with three other baseline methods, our proposed intelligent trainer methods have a competitive performance in autotuning capability, with up to 56% expected sampling cost saving without knowing the best parameter configurations in advance. The proposed trainer framework can be easily extended to tasks that require costly hyperparameter tuning.
first_indexed 2024-10-01T05:46:45Z
format Journal Article
id ntu-10356/159633
institution Nanyang Technological University
language English
last_indexed 2024-10-01T05:46:45Z
publishDate 2022
record_format dspace
spelling ntu-10356/1596332022-06-28T08:32:47Z Intelligent trainer for Dyna-style model-based deep reinforcement learning Dong, Linsen Li, Yuanlong Zhou, Xin Wen, Yonggang Guan, Kyle School of Computer Science and Engineering Engineering::Computer science and engineering Reinforcement Learning Ensemble Algorithm Model-based reinforcement learning (MBRL) has been proposed as a promising alternative solution to tackle the high sampling cost challenge in the canonical RL, by leveraging a system dynamics model to generate synthetic data for policy training purpose. The MBRL framework, nevertheless, is inherently limited by the convoluted process of jointly optimizing control policy, learning system dynamics, and sampling data from two sources controlled by complicated hyperparameters. As such, the training process involves overwhelmingly manual tuning and is prohibitively costly. In this research, we propose a "reinforcement on reinforcement" (RoR) architecture to decompose the convoluted tasks into two decoupled layers of RL. The inner layer is the canonical MBRL training process which is formulated as a Markov decision process, called training process environment (TPE). The outer layer serves as an RL agent, called intelligent trainer, to learn an optimal hyperparameter configuration for the inner TPE. This decomposition approach provides much-needed flexibility to implement different trainer designs, referred to "train the trainer." In our research, we propose and optimize two alternative trainer designs: 1) an unihead trainer and 2) a multihead trainer. Our proposed RoR framework is evaluated for five tasks in the OpenAI gym. Compared with three other baseline methods, our proposed intelligent trainer methods have a competitive performance in autotuning capability, with up to 56% expected sampling cost saving without knowing the best parameter configurations in advance. The proposed trainer framework can be easily extended to tasks that require costly hyperparameter tuning. Energy Market Authority (EMA) Info-communications Media Development Authority (IMDA) National Research Foundation (NRF) This work was supported in part by the Energy Program, Nation Research Foundation, Prime Minister’s Office, Singapore, administrated by the Energy Market Authority of Singapore, under Award NRF2017EWT-EP003-023, in part by the Green Data Centre Research administrated by the Info-communications Media Development Authority, under Award NRF2015ENC-GDCR01001-003, and in part by the Behavioral Studies in the Energy, Water, Waste and Transportation Sector under Award BSEWWT2017_2_06. 2022-06-28T08:32:47Z 2022-06-28T08:32:47Z 2020 Journal Article Dong, L., Li, Y., Zhou, X., Wen, Y. & Guan, K. (2020). Intelligent trainer for Dyna-style model-based deep reinforcement learning. IEEE Transactions On Neural Networks and Learning Systems, 32(6), 2758-2771. https://dx.doi.org/10.1109/TNNLS.2020.3008249 2162-237X https://hdl.handle.net/10356/159633 10.1109/TNNLS.2020.3008249 32866102 2-s2.0-85107364341 6 32 2758 2771 en NRF2017EWT-EP003-023 NRF2015ENC-GDCR01001-003 BSEWWT2017_2_06 IEEE Transactions on Neural Networks and Learning Systems © 2020 IEEE. All rights reserved.
spellingShingle Engineering::Computer science and engineering
Reinforcement Learning
Ensemble Algorithm
Dong, Linsen
Li, Yuanlong
Zhou, Xin
Wen, Yonggang
Guan, Kyle
Intelligent trainer for Dyna-style model-based deep reinforcement learning
title Intelligent trainer for Dyna-style model-based deep reinforcement learning
title_full Intelligent trainer for Dyna-style model-based deep reinforcement learning
title_fullStr Intelligent trainer for Dyna-style model-based deep reinforcement learning
title_full_unstemmed Intelligent trainer for Dyna-style model-based deep reinforcement learning
title_short Intelligent trainer for Dyna-style model-based deep reinforcement learning
title_sort intelligent trainer for dyna style model based deep reinforcement learning
topic Engineering::Computer science and engineering
Reinforcement Learning
Ensemble Algorithm
url https://hdl.handle.net/10356/159633
work_keys_str_mv AT donglinsen intelligenttrainerfordynastylemodelbaseddeepreinforcementlearning
AT liyuanlong intelligenttrainerfordynastylemodelbaseddeepreinforcementlearning
AT zhouxin intelligenttrainerfordynastylemodelbaseddeepreinforcementlearning
AT wenyonggang intelligenttrainerfordynastylemodelbaseddeepreinforcementlearning
AT guankyle intelligenttrainerfordynastylemodelbaseddeepreinforcementlearning