Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction

Abstract The application of reinforcement learning (RL) to the field of autonomous robotics has high requirements about sample efficiency, since the agent expends for interaction with the environment. One method for sample efficiency is to extract knowledge from existing samples and used to explorat...

Full description

Bibliographic Details
Main Authors: Dongfang Zhao, Xu Huanshi, Zhang Xun
Format: Article
Language:English
Published: Springer 2024-01-01
Series:International Journal of Computational Intelligence Systems
Subjects:
Online Access:https://doi.org/10.1007/s44196-023-00389-1
_version_ 1827382022256984064
author Dongfang Zhao
Xu Huanshi
Zhang Xun
author_facet Dongfang Zhao
Xu Huanshi
Zhang Xun
author_sort Dongfang Zhao
collection DOAJ
description Abstract The application of reinforcement learning (RL) to the field of autonomous robotics has high requirements about sample efficiency, since the agent expends for interaction with the environment. One method for sample efficiency is to extract knowledge from existing samples and used to exploration. Typical RL algorithms achieve exploration using task-specific knowledge or adding exploration noise. These methods are limited to current policy improvement level and lack of long-term planning. We propose a novel active exploration deep RL algorithm for the continuous action space problem named active exploration deep reinforcement learning (AEDRL). Our method uses the Gaussian process to model dynamic model, enabling the probability description of prediction sample. Action selection is formulated as the solution of the optimization problem. Thus, the optimization objective is specifically designed for selecting samples that can minimize the uncertainty of the dynamic model. Active exploration is achieved through long-term optimized action selection. This long-term considered action exploration method is more guidance for learning. Enable intelligent agents to explore more interesting action spaces. The proposed AEDRL algorithm is evaluated on several robotic control task including classic pendulum problem and five complex articulated robots. The AEDRL can learn a controller using fewer episodes and demonstrates performance and sample efficiency.
first_indexed 2024-03-08T14:12:27Z
format Article
id doaj.art-b489d98dc81c4b269c6963c233b413de
institution Directory Open Access Journal
issn 1875-6883
language English
last_indexed 2024-03-08T14:12:27Z
publishDate 2024-01-01
publisher Springer
record_format Article
series International Journal of Computational Intelligence Systems
spelling doaj.art-b489d98dc81c4b269c6963c233b413de2024-01-14T12:36:00ZengSpringerInternational Journal of Computational Intelligence Systems1875-68832024-01-011711810.1007/s44196-023-00389-1Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward PredictionDongfang Zhao0Xu Huanshi1Zhang Xun2School of Computer science and Engineering, Beijing Technology and Business UniversitySchool of Computer science and Engineering, Beijing Technology and Business UniversitySchool of Computer science and Engineering, Beijing Technology and Business UniversityAbstract The application of reinforcement learning (RL) to the field of autonomous robotics has high requirements about sample efficiency, since the agent expends for interaction with the environment. One method for sample efficiency is to extract knowledge from existing samples and used to exploration. Typical RL algorithms achieve exploration using task-specific knowledge or adding exploration noise. These methods are limited to current policy improvement level and lack of long-term planning. We propose a novel active exploration deep RL algorithm for the continuous action space problem named active exploration deep reinforcement learning (AEDRL). Our method uses the Gaussian process to model dynamic model, enabling the probability description of prediction sample. Action selection is formulated as the solution of the optimization problem. Thus, the optimization objective is specifically designed for selecting samples that can minimize the uncertainty of the dynamic model. Active exploration is achieved through long-term optimized action selection. This long-term considered action exploration method is more guidance for learning. Enable intelligent agents to explore more interesting action spaces. The proposed AEDRL algorithm is evaluated on several robotic control task including classic pendulum problem and five complex articulated robots. The AEDRL can learn a controller using fewer episodes and demonstrates performance and sample efficiency.https://doi.org/10.1007/s44196-023-00389-1Reinforcement learningDeep deterministic policy gradientGaussian processInformation entropy
spellingShingle Dongfang Zhao
Xu Huanshi
Zhang Xun
Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction
International Journal of Computational Intelligence Systems
Reinforcement learning
Deep deterministic policy gradient
Gaussian process
Information entropy
title Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction
title_full Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction
title_fullStr Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction
title_full_unstemmed Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction
title_short Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction
title_sort active exploration deep reinforcement learning for continuous action space with forward prediction
topic Reinforcement learning
Deep deterministic policy gradient
Gaussian process
Information entropy
url https://doi.org/10.1007/s44196-023-00389-1
work_keys_str_mv AT dongfangzhao activeexplorationdeepreinforcementlearningforcontinuousactionspacewithforwardprediction
AT xuhuanshi activeexplorationdeepreinforcementlearningforcontinuousactionspacewithforwardprediction
AT zhangxun activeexplorationdeepreinforcementlearningforcontinuousactionspacewithforwardprediction