Ensemble Bootstrapped Deep Deterministic Policy Gradient for Vision-Based Robotic Grasping

With sufficient practice, humans can grab objects they have never seen before through brain decision-making. However, the manipulators, which has a wide range of applications in industrial production, can still only grab specific objects. Because most of the grasp algorithms rely on prior knowledge...

Full description

Bibliographic Details
Main Authors:	Weiwei Liu, Linpeng Peng, Junjie Cao, Xiaokuan Fu, Yong Liu, Zaisheng Pan, Jian Yang
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Robotic grasping reinforcement learning ensemble learning Thompson sampling
Online Access:	https://ieeexplore.ieee.org/document/9316755/

_version_	1818736208698671104
author	Weiwei Liu Linpeng Peng Junjie Cao Xiaokuan Fu Yong Liu Zaisheng Pan Jian Yang
author_facet	Weiwei Liu Linpeng Peng Junjie Cao Xiaokuan Fu Yong Liu Zaisheng Pan Jian Yang
author_sort	Weiwei Liu
collection	DOAJ
description	With sufficient practice, humans can grab objects they have never seen before through brain decision-making. However, the manipulators, which has a wide range of applications in industrial production, can still only grab specific objects. Because most of the grasp algorithms rely on prior knowledge such as hand-eye calibration results, object model features, and can only target specific types of objects. When the task scenario and the operation target change, it cannot perform effective redeployment. In order to solve the above problems, academia often uses reinforcement learning to train grasping algorithms. However, the method of reinforcement learning in the field of manipulators grasping mainly encounters these main problems: insufficient sample utilization, poor algorithm stability, and limited exploration. This article uses LfD, BC, and DDPG to improve sample utilization. Use multiple critics to integrate and evaluate input actions to solve the problem of algorithm instability. Finally, inspired by Thompson's sampling idea, the input action is evaluated from different angles, which increases the algorithm's exploration of the environment and reduces the number of interactions with the environment. EDDPG and EBDDPG algorithm is designed in the article. In order to further improve the generalization ability of the algorithm, this article does not use extra information that is difficult to obtain directly on the physical platform, such as the real coordinates of the target object and the continuous motion space at the end of the manipulator in the Cartesian coordinate system is used as the output of the decision. The simulation results show that, under the same number of interactions, the manipulators' success rate in grabbing 1000 random objects has increased more than double and reached state-of-the-art(SOTA) performance.
first_indexed	2024-12-18T00:33:30Z
format	Article
id	doaj.art-ecc8057156e74493ab539be9a9d6d660
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-18T00:33:30Z
publishDate	2021-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-ecc8057156e74493ab539be9a9d6d6602022-12-21T21:27:04ZengIEEEIEEE Access2169-35362021-01-019199161992510.1109/ACCESS.2021.30498609316755Ensemble Bootstrapped Deep Deterministic Policy Gradient for Vision-Based Robotic GraspingWeiwei Liu0https://orcid.org/0000-0002-2496-7748Linpeng Peng1https://orcid.org/0000-0003-1754-9879Junjie Cao2Xiaokuan Fu3https://orcid.org/0000-0001-7115-5091Yong Liu4https://orcid.org/0000-0003-4822-8939Zaisheng Pan5Jian Yang6Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, ChinaInstitute of Cyber-Systems and Control, Zhejiang University, Hangzhou, ChinaInstitute of Cyber-Systems and Control, Zhejiang University, Hangzhou, ChinaInstitute of Cyber-Systems and Control, Zhejiang University, Hangzhou, ChinaInstitute of Cyber-Systems and Control, Zhejiang University, Hangzhou, ChinaInstitute of Cyber-Systems and Control, Zhejiang University, Hangzhou, ChinaChina Research and Development Academy of Machinery Equipment, Beijing, ChinaWith sufficient practice, humans can grab objects they have never seen before through brain decision-making. However, the manipulators, which has a wide range of applications in industrial production, can still only grab specific objects. Because most of the grasp algorithms rely on prior knowledge such as hand-eye calibration results, object model features, and can only target specific types of objects. When the task scenario and the operation target change, it cannot perform effective redeployment. In order to solve the above problems, academia often uses reinforcement learning to train grasping algorithms. However, the method of reinforcement learning in the field of manipulators grasping mainly encounters these main problems: insufficient sample utilization, poor algorithm stability, and limited exploration. This article uses LfD, BC, and DDPG to improve sample utilization. Use multiple critics to integrate and evaluate input actions to solve the problem of algorithm instability. Finally, inspired by Thompson's sampling idea, the input action is evaluated from different angles, which increases the algorithm's exploration of the environment and reduces the number of interactions with the environment. EDDPG and EBDDPG algorithm is designed in the article. In order to further improve the generalization ability of the algorithm, this article does not use extra information that is difficult to obtain directly on the physical platform, such as the real coordinates of the target object and the continuous motion space at the end of the manipulator in the Cartesian coordinate system is used as the output of the decision. The simulation results show that, under the same number of interactions, the manipulators' success rate in grabbing 1000 random objects has increased more than double and reached state-of-the-art(SOTA) performance.https://ieeexplore.ieee.org/document/9316755/Robotic graspingreinforcement learningensemble learningThompson sampling
spellingShingle	Weiwei Liu Linpeng Peng Junjie Cao Xiaokuan Fu Yong Liu Zaisheng Pan Jian Yang Ensemble Bootstrapped Deep Deterministic Policy Gradient for Vision-Based Robotic Grasping IEEE Access Robotic grasping reinforcement learning ensemble learning Thompson sampling
title	Ensemble Bootstrapped Deep Deterministic Policy Gradient for Vision-Based Robotic Grasping
title_full	Ensemble Bootstrapped Deep Deterministic Policy Gradient for Vision-Based Robotic Grasping
title_fullStr	Ensemble Bootstrapped Deep Deterministic Policy Gradient for Vision-Based Robotic Grasping
title_full_unstemmed	Ensemble Bootstrapped Deep Deterministic Policy Gradient for Vision-Based Robotic Grasping
title_short	Ensemble Bootstrapped Deep Deterministic Policy Gradient for Vision-Based Robotic Grasping
title_sort	ensemble bootstrapped deep deterministic policy gradient for vision based robotic grasping
topic	Robotic grasping reinforcement learning ensemble learning Thompson sampling
url	https://ieeexplore.ieee.org/document/9316755/
work_keys_str_mv	AT weiweiliu ensemblebootstrappeddeepdeterministicpolicygradientforvisionbasedroboticgrasping AT linpengpeng ensemblebootstrappeddeepdeterministicpolicygradientforvisionbasedroboticgrasping AT junjiecao ensemblebootstrappeddeepdeterministicpolicygradientforvisionbasedroboticgrasping AT xiaokuanfu ensemblebootstrappeddeepdeterministicpolicygradientforvisionbasedroboticgrasping AT yongliu ensemblebootstrappeddeepdeterministicpolicygradientforvisionbasedroboticgrasping AT zaishengpan ensemblebootstrappeddeepdeterministicpolicygradientforvisionbasedroboticgrasping AT jianyang ensemblebootstrappeddeepdeterministicpolicygradientforvisionbasedroboticgrasping

Ensemble Bootstrapped Deep Deterministic Policy Gradient for Vision-Based Robotic Grasping

Similar Items