Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction

Abstract The application of reinforcement learning (RL) to the field of autonomous robotics has high requirements about sample efficiency, since the agent expends for interaction with the environment. One method for sample efficiency is to extract knowledge from existing samples and used to explorat...

Full description

Bibliographic Details
Main Authors:	Dongfang Zhao, Xu Huanshi, Zhang Xun
Format:	Article
Language:	English
Published:	Springer 2024-01-01
Series:	International Journal of Computational Intelligence Systems
Subjects:	Reinforcement learning Deep deterministic policy gradient Gaussian process Information entropy
Online Access:	https://doi.org/10.1007/s44196-023-00389-1

_version_	1827382022256984064
author	Dongfang Zhao Xu Huanshi Zhang Xun
author_facet	Dongfang Zhao Xu Huanshi Zhang Xun
author_sort	Dongfang Zhao
collection	DOAJ
description	Abstract The application of reinforcement learning (RL) to the field of autonomous robotics has high requirements about sample efficiency, since the agent expends for interaction with the environment. One method for sample efficiency is to extract knowledge from existing samples and used to exploration. Typical RL algorithms achieve exploration using task-specific knowledge or adding exploration noise. These methods are limited to current policy improvement level and lack of long-term planning. We propose a novel active exploration deep RL algorithm for the continuous action space problem named active exploration deep reinforcement learning (AEDRL). Our method uses the Gaussian process to model dynamic model, enabling the probability description of prediction sample. Action selection is formulated as the solution of the optimization problem. Thus, the optimization objective is specifically designed for selecting samples that can minimize the uncertainty of the dynamic model. Active exploration is achieved through long-term optimized action selection. This long-term considered action exploration method is more guidance for learning. Enable intelligent agents to explore more interesting action spaces. The proposed AEDRL algorithm is evaluated on several robotic control task including classic pendulum problem and five complex articulated robots. The AEDRL can learn a controller using fewer episodes and demonstrates performance and sample efficiency.
first_indexed	2024-03-08T14:12:27Z
format	Article
id	doaj.art-b489d98dc81c4b269c6963c233b413de
institution	Directory Open Access Journal
issn	1875-6883
language	English
last_indexed	2024-03-08T14:12:27Z
publishDate	2024-01-01
publisher	Springer
record_format	Article
series	International Journal of Computational Intelligence Systems
spelling	doaj.art-b489d98dc81c4b269c6963c233b413de2024-01-14T12:36:00ZengSpringerInternational Journal of Computational Intelligence Systems1875-68832024-01-011711810.1007/s44196-023-00389-1Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward PredictionDongfang Zhao0Xu Huanshi1Zhang Xun2School of Computer science and Engineering, Beijing Technology and Business UniversitySchool of Computer science and Engineering, Beijing Technology and Business UniversitySchool of Computer science and Engineering, Beijing Technology and Business UniversityAbstract The application of reinforcement learning (RL) to the field of autonomous robotics has high requirements about sample efficiency, since the agent expends for interaction with the environment. One method for sample efficiency is to extract knowledge from existing samples and used to exploration. Typical RL algorithms achieve exploration using task-specific knowledge or adding exploration noise. These methods are limited to current policy improvement level and lack of long-term planning. We propose a novel active exploration deep RL algorithm for the continuous action space problem named active exploration deep reinforcement learning (AEDRL). Our method uses the Gaussian process to model dynamic model, enabling the probability description of prediction sample. Action selection is formulated as the solution of the optimization problem. Thus, the optimization objective is specifically designed for selecting samples that can minimize the uncertainty of the dynamic model. Active exploration is achieved through long-term optimized action selection. This long-term considered action exploration method is more guidance for learning. Enable intelligent agents to explore more interesting action spaces. The proposed AEDRL algorithm is evaluated on several robotic control task including classic pendulum problem and five complex articulated robots. The AEDRL can learn a controller using fewer episodes and demonstrates performance and sample efficiency.https://doi.org/10.1007/s44196-023-00389-1Reinforcement learningDeep deterministic policy gradientGaussian processInformation entropy
spellingShingle	Dongfang Zhao Xu Huanshi Zhang Xun Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction International Journal of Computational Intelligence Systems Reinforcement learning Deep deterministic policy gradient Gaussian process Information entropy
title	Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction
title_full	Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction
title_fullStr	Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction
title_full_unstemmed	Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction
title_short	Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction
title_sort	active exploration deep reinforcement learning for continuous action space with forward prediction
topic	Reinforcement learning Deep deterministic policy gradient Gaussian process Information entropy
url	https://doi.org/10.1007/s44196-023-00389-1
work_keys_str_mv	AT dongfangzhao activeexplorationdeepreinforcementlearningforcontinuousactionspacewithforwardprediction AT xuhuanshi activeexplorationdeepreinforcementlearningforcontinuousactionspacewithforwardprediction AT zhangxun activeexplorationdeepreinforcementlearningforcontinuousactionspacewithforwardprediction

Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction

Similar Items