An Active Exploration Method for Data Efficient Reinforcement Learning
Reinforcement learning (RL) constitutes an effective method of controlling dynamic systems without prior knowledge. One of the most important and difficult problems in RL is the improvement of data efficiency. Probabilistic inference for learning control (PILCO) is a state-of-the-art data-efficient...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Sciendo
2019-06-01
|
Series: | International Journal of Applied Mathematics and Computer Science |
Subjects: | |
Online Access: | https://doi.org/10.2478/amcs-2019-0026 |
_version_ | 1819242110870618112 |
---|---|
author | Zhao Dongfang Liu Jiafeng Wu Rui Cheng Dansong Tang Xianglong |
author_facet | Zhao Dongfang Liu Jiafeng Wu Rui Cheng Dansong Tang Xianglong |
author_sort | Zhao Dongfang |
collection | DOAJ |
description | Reinforcement learning (RL) constitutes an effective method of controlling dynamic systems without prior knowledge. One of the most important and difficult problems in RL is the improvement of data efficiency. Probabilistic inference for learning control (PILCO) is a state-of-the-art data-efficient framework that uses a Gaussian process to model dynamic systems. However, it only focuses on optimizing cumulative rewards and does not consider the accuracy of a dynamic model, which is an important factor for controller learning. To further improve the data efficiency of PILCO, we propose its active exploration version (AEPILCO) that utilizes information entropy to describe samples. In the policy evaluation stage, we incorporate an information entropy criterion into long-term sample prediction. Through the informative policy evaluation function, our algorithm obtains informative policy parameters in the policy improvement stage. Using the policy parameters in the actual execution produces an informative sample set; this is helpful in learning an accurate dynamic model. Thus, the AEPILCO algorithm improves data efficiency by learning an accurate dynamic model by actively selecting informative samples based on the information entropy criterion. We demonstrate the validity and efficiency of the proposed algorithm for several challenging controller problems involving a cart pole, a pendubot, a double pendulum, and a cart double pendulum. The AEPILCO algorithm can learn a controller using fewer trials compared to PILCO. This is verified through theoretical analysis and experimental results. |
first_indexed | 2024-12-23T14:34:36Z |
format | Article |
id | doaj.art-f6dc2b7911544c01b01453a6d6a5e207 |
institution | Directory Open Access Journal |
issn | 2083-8492 |
language | English |
last_indexed | 2024-12-23T14:34:36Z |
publishDate | 2019-06-01 |
publisher | Sciendo |
record_format | Article |
series | International Journal of Applied Mathematics and Computer Science |
spelling | doaj.art-f6dc2b7911544c01b01453a6d6a5e2072022-12-21T17:43:24ZengSciendoInternational Journal of Applied Mathematics and Computer Science2083-84922019-06-0129235136210.2478/amcs-2019-0026amcs-2019-0026An Active Exploration Method for Data Efficient Reinforcement LearningZhao Dongfang0Liu Jiafeng1Wu Rui2Cheng Dansong3Tang Xianglong4School of Computer Science and Technology, Harbin Institute of Technology, West Dazhi Street #92, Harbin150001, ChinaSchool of Computer Science and Technology, Harbin Institute of Technology, West Dazhi Street #92, Harbin150001, ChinaSchool of Computer Science and Technology, Harbin Institute of Technology, West Dazhi Street #92, Harbin150001, ChinaSchool of Computer Science and Technology, Harbin Institute of Technology, West Dazhi Street #92, Harbin150001, ChinaSchool of Computer Science and Technology, Harbin Institute of Technology, West Dazhi Street #92, Harbin150001, ChinaReinforcement learning (RL) constitutes an effective method of controlling dynamic systems without prior knowledge. One of the most important and difficult problems in RL is the improvement of data efficiency. Probabilistic inference for learning control (PILCO) is a state-of-the-art data-efficient framework that uses a Gaussian process to model dynamic systems. However, it only focuses on optimizing cumulative rewards and does not consider the accuracy of a dynamic model, which is an important factor for controller learning. To further improve the data efficiency of PILCO, we propose its active exploration version (AEPILCO) that utilizes information entropy to describe samples. In the policy evaluation stage, we incorporate an information entropy criterion into long-term sample prediction. Through the informative policy evaluation function, our algorithm obtains informative policy parameters in the policy improvement stage. Using the policy parameters in the actual execution produces an informative sample set; this is helpful in learning an accurate dynamic model. Thus, the AEPILCO algorithm improves data efficiency by learning an accurate dynamic model by actively selecting informative samples based on the information entropy criterion. We demonstrate the validity and efficiency of the proposed algorithm for several challenging controller problems involving a cart pole, a pendubot, a double pendulum, and a cart double pendulum. The AEPILCO algorithm can learn a controller using fewer trials compared to PILCO. This is verified through theoretical analysis and experimental results.https://doi.org/10.2478/amcs-2019-0026reinforcement learninginformation entropypilcodata efficiency |
spellingShingle | Zhao Dongfang Liu Jiafeng Wu Rui Cheng Dansong Tang Xianglong An Active Exploration Method for Data Efficient Reinforcement Learning International Journal of Applied Mathematics and Computer Science reinforcement learning information entropy pilco data efficiency |
title | An Active Exploration Method for Data Efficient Reinforcement Learning |
title_full | An Active Exploration Method for Data Efficient Reinforcement Learning |
title_fullStr | An Active Exploration Method for Data Efficient Reinforcement Learning |
title_full_unstemmed | An Active Exploration Method for Data Efficient Reinforcement Learning |
title_short | An Active Exploration Method for Data Efficient Reinforcement Learning |
title_sort | active exploration method for data efficient reinforcement learning |
topic | reinforcement learning information entropy pilco data efficiency |
url | https://doi.org/10.2478/amcs-2019-0026 |
work_keys_str_mv | AT zhaodongfang anactiveexplorationmethodfordataefficientreinforcementlearning AT liujiafeng anactiveexplorationmethodfordataefficientreinforcementlearning AT wurui anactiveexplorationmethodfordataefficientreinforcementlearning AT chengdansong anactiveexplorationmethodfordataefficientreinforcementlearning AT tangxianglong anactiveexplorationmethodfordataefficientreinforcementlearning AT zhaodongfang activeexplorationmethodfordataefficientreinforcementlearning AT liujiafeng activeexplorationmethodfordataefficientreinforcementlearning AT wurui activeexplorationmethodfordataefficientreinforcementlearning AT chengdansong activeexplorationmethodfordataefficientreinforcementlearning AT tangxianglong activeexplorationmethodfordataefficientreinforcementlearning |