Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction
Abstract The application of reinforcement learning (RL) to the field of autonomous robotics has high requirements about sample efficiency, since the agent expends for interaction with the environment. One method for sample efficiency is to extract knowledge from existing samples and used to explorat...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2024-01-01
|
Series: | International Journal of Computational Intelligence Systems |
Subjects: | |
Online Access: | https://doi.org/10.1007/s44196-023-00389-1 |
_version_ | 1827382022256984064 |
---|---|
author | Dongfang Zhao Xu Huanshi Zhang Xun |
author_facet | Dongfang Zhao Xu Huanshi Zhang Xun |
author_sort | Dongfang Zhao |
collection | DOAJ |
description | Abstract The application of reinforcement learning (RL) to the field of autonomous robotics has high requirements about sample efficiency, since the agent expends for interaction with the environment. One method for sample efficiency is to extract knowledge from existing samples and used to exploration. Typical RL algorithms achieve exploration using task-specific knowledge or adding exploration noise. These methods are limited to current policy improvement level and lack of long-term planning. We propose a novel active exploration deep RL algorithm for the continuous action space problem named active exploration deep reinforcement learning (AEDRL). Our method uses the Gaussian process to model dynamic model, enabling the probability description of prediction sample. Action selection is formulated as the solution of the optimization problem. Thus, the optimization objective is specifically designed for selecting samples that can minimize the uncertainty of the dynamic model. Active exploration is achieved through long-term optimized action selection. This long-term considered action exploration method is more guidance for learning. Enable intelligent agents to explore more interesting action spaces. The proposed AEDRL algorithm is evaluated on several robotic control task including classic pendulum problem and five complex articulated robots. The AEDRL can learn a controller using fewer episodes and demonstrates performance and sample efficiency. |
first_indexed | 2024-03-08T14:12:27Z |
format | Article |
id | doaj.art-b489d98dc81c4b269c6963c233b413de |
institution | Directory Open Access Journal |
issn | 1875-6883 |
language | English |
last_indexed | 2024-03-08T14:12:27Z |
publishDate | 2024-01-01 |
publisher | Springer |
record_format | Article |
series | International Journal of Computational Intelligence Systems |
spelling | doaj.art-b489d98dc81c4b269c6963c233b413de2024-01-14T12:36:00ZengSpringerInternational Journal of Computational Intelligence Systems1875-68832024-01-011711810.1007/s44196-023-00389-1Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward PredictionDongfang Zhao0Xu Huanshi1Zhang Xun2School of Computer science and Engineering, Beijing Technology and Business UniversitySchool of Computer science and Engineering, Beijing Technology and Business UniversitySchool of Computer science and Engineering, Beijing Technology and Business UniversityAbstract The application of reinforcement learning (RL) to the field of autonomous robotics has high requirements about sample efficiency, since the agent expends for interaction with the environment. One method for sample efficiency is to extract knowledge from existing samples and used to exploration. Typical RL algorithms achieve exploration using task-specific knowledge or adding exploration noise. These methods are limited to current policy improvement level and lack of long-term planning. We propose a novel active exploration deep RL algorithm for the continuous action space problem named active exploration deep reinforcement learning (AEDRL). Our method uses the Gaussian process to model dynamic model, enabling the probability description of prediction sample. Action selection is formulated as the solution of the optimization problem. Thus, the optimization objective is specifically designed for selecting samples that can minimize the uncertainty of the dynamic model. Active exploration is achieved through long-term optimized action selection. This long-term considered action exploration method is more guidance for learning. Enable intelligent agents to explore more interesting action spaces. The proposed AEDRL algorithm is evaluated on several robotic control task including classic pendulum problem and five complex articulated robots. The AEDRL can learn a controller using fewer episodes and demonstrates performance and sample efficiency.https://doi.org/10.1007/s44196-023-00389-1Reinforcement learningDeep deterministic policy gradientGaussian processInformation entropy |
spellingShingle | Dongfang Zhao Xu Huanshi Zhang Xun Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction International Journal of Computational Intelligence Systems Reinforcement learning Deep deterministic policy gradient Gaussian process Information entropy |
title | Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction |
title_full | Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction |
title_fullStr | Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction |
title_full_unstemmed | Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction |
title_short | Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction |
title_sort | active exploration deep reinforcement learning for continuous action space with forward prediction |
topic | Reinforcement learning Deep deterministic policy gradient Gaussian process Information entropy |
url | https://doi.org/10.1007/s44196-023-00389-1 |
work_keys_str_mv | AT dongfangzhao activeexplorationdeepreinforcementlearningforcontinuousactionspacewithforwardprediction AT xuhuanshi activeexplorationdeepreinforcementlearningforcontinuousactionspacewithforwardprediction AT zhangxun activeexplorationdeepreinforcementlearningforcontinuousactionspacewithforwardprediction |