An Active Exploration Method for Data Efficient Reinforcement Learning

Reinforcement learning (RL) constitutes an effective method of controlling dynamic systems without prior knowledge. One of the most important and difficult problems in RL is the improvement of data efficiency. Probabilistic inference for learning control (PILCO) is a state-of-the-art data-efficient...

Full description

Bibliographic Details
Main Authors: Zhao Dongfang, Liu Jiafeng, Wu Rui, Cheng Dansong, Tang Xianglong
Format: Article
Language:English
Published: Sciendo 2019-06-01
Series:International Journal of Applied Mathematics and Computer Science
Subjects:
Online Access:https://doi.org/10.2478/amcs-2019-0026
_version_ 1819242110870618112
author Zhao Dongfang
Liu Jiafeng
Wu Rui
Cheng Dansong
Tang Xianglong
author_facet Zhao Dongfang
Liu Jiafeng
Wu Rui
Cheng Dansong
Tang Xianglong
author_sort Zhao Dongfang
collection DOAJ
description Reinforcement learning (RL) constitutes an effective method of controlling dynamic systems without prior knowledge. One of the most important and difficult problems in RL is the improvement of data efficiency. Probabilistic inference for learning control (PILCO) is a state-of-the-art data-efficient framework that uses a Gaussian process to model dynamic systems. However, it only focuses on optimizing cumulative rewards and does not consider the accuracy of a dynamic model, which is an important factor for controller learning. To further improve the data efficiency of PILCO, we propose its active exploration version (AEPILCO) that utilizes information entropy to describe samples. In the policy evaluation stage, we incorporate an information entropy criterion into long-term sample prediction. Through the informative policy evaluation function, our algorithm obtains informative policy parameters in the policy improvement stage. Using the policy parameters in the actual execution produces an informative sample set; this is helpful in learning an accurate dynamic model. Thus, the AEPILCO algorithm improves data efficiency by learning an accurate dynamic model by actively selecting informative samples based on the information entropy criterion. We demonstrate the validity and efficiency of the proposed algorithm for several challenging controller problems involving a cart pole, a pendubot, a double pendulum, and a cart double pendulum. The AEPILCO algorithm can learn a controller using fewer trials compared to PILCO. This is verified through theoretical analysis and experimental results.
first_indexed 2024-12-23T14:34:36Z
format Article
id doaj.art-f6dc2b7911544c01b01453a6d6a5e207
institution Directory Open Access Journal
issn 2083-8492
language English
last_indexed 2024-12-23T14:34:36Z
publishDate 2019-06-01
publisher Sciendo
record_format Article
series International Journal of Applied Mathematics and Computer Science
spelling doaj.art-f6dc2b7911544c01b01453a6d6a5e2072022-12-21T17:43:24ZengSciendoInternational Journal of Applied Mathematics and Computer Science2083-84922019-06-0129235136210.2478/amcs-2019-0026amcs-2019-0026An Active Exploration Method for Data Efficient Reinforcement LearningZhao Dongfang0Liu Jiafeng1Wu Rui2Cheng Dansong3Tang Xianglong4School of Computer Science and Technology, Harbin Institute of Technology, West Dazhi Street #92, Harbin150001, ChinaSchool of Computer Science and Technology, Harbin Institute of Technology, West Dazhi Street #92, Harbin150001, ChinaSchool of Computer Science and Technology, Harbin Institute of Technology, West Dazhi Street #92, Harbin150001, ChinaSchool of Computer Science and Technology, Harbin Institute of Technology, West Dazhi Street #92, Harbin150001, ChinaSchool of Computer Science and Technology, Harbin Institute of Technology, West Dazhi Street #92, Harbin150001, ChinaReinforcement learning (RL) constitutes an effective method of controlling dynamic systems without prior knowledge. One of the most important and difficult problems in RL is the improvement of data efficiency. Probabilistic inference for learning control (PILCO) is a state-of-the-art data-efficient framework that uses a Gaussian process to model dynamic systems. However, it only focuses on optimizing cumulative rewards and does not consider the accuracy of a dynamic model, which is an important factor for controller learning. To further improve the data efficiency of PILCO, we propose its active exploration version (AEPILCO) that utilizes information entropy to describe samples. In the policy evaluation stage, we incorporate an information entropy criterion into long-term sample prediction. Through the informative policy evaluation function, our algorithm obtains informative policy parameters in the policy improvement stage. Using the policy parameters in the actual execution produces an informative sample set; this is helpful in learning an accurate dynamic model. Thus, the AEPILCO algorithm improves data efficiency by learning an accurate dynamic model by actively selecting informative samples based on the information entropy criterion. We demonstrate the validity and efficiency of the proposed algorithm for several challenging controller problems involving a cart pole, a pendubot, a double pendulum, and a cart double pendulum. The AEPILCO algorithm can learn a controller using fewer trials compared to PILCO. This is verified through theoretical analysis and experimental results.https://doi.org/10.2478/amcs-2019-0026reinforcement learninginformation entropypilcodata efficiency
spellingShingle Zhao Dongfang
Liu Jiafeng
Wu Rui
Cheng Dansong
Tang Xianglong
An Active Exploration Method for Data Efficient Reinforcement Learning
International Journal of Applied Mathematics and Computer Science
reinforcement learning
information entropy
pilco
data efficiency
title An Active Exploration Method for Data Efficient Reinforcement Learning
title_full An Active Exploration Method for Data Efficient Reinforcement Learning
title_fullStr An Active Exploration Method for Data Efficient Reinforcement Learning
title_full_unstemmed An Active Exploration Method for Data Efficient Reinforcement Learning
title_short An Active Exploration Method for Data Efficient Reinforcement Learning
title_sort active exploration method for data efficient reinforcement learning
topic reinforcement learning
information entropy
pilco
data efficiency
url https://doi.org/10.2478/amcs-2019-0026
work_keys_str_mv AT zhaodongfang anactiveexplorationmethodfordataefficientreinforcementlearning
AT liujiafeng anactiveexplorationmethodfordataefficientreinforcementlearning
AT wurui anactiveexplorationmethodfordataefficientreinforcementlearning
AT chengdansong anactiveexplorationmethodfordataefficientreinforcementlearning
AT tangxianglong anactiveexplorationmethodfordataefficientreinforcementlearning
AT zhaodongfang activeexplorationmethodfordataefficientreinforcementlearning
AT liujiafeng activeexplorationmethodfordataefficientreinforcementlearning
AT wurui activeexplorationmethodfordataefficientreinforcementlearning
AT chengdansong activeexplorationmethodfordataefficientreinforcementlearning
AT tangxianglong activeexplorationmethodfordataefficientreinforcementlearning