The Task Decomposition and Dedicated Reward-System-Based Reinforcement Learning Algorithm for Pick-and-Place
This paper proposes a task decomposition and dedicated reward-system-based reinforcement learning algorithm for the Pick-and-Place task, which is one of the high-level tasks of robot manipulators. The proposed method decomposes the Pick-and-Place task into three subtasks: two reaching tasks and one...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-06-01
|
Series: | Biomimetics |
Subjects: | |
Online Access: | https://www.mdpi.com/2313-7673/8/2/240 |
_version_ | 1827738297686818816 |
---|---|
author | Byeongjun Kim Gunam Kwon Chaneun Park Nam Kyu Kwon |
author_facet | Byeongjun Kim Gunam Kwon Chaneun Park Nam Kyu Kwon |
author_sort | Byeongjun Kim |
collection | DOAJ |
description | This paper proposes a task decomposition and dedicated reward-system-based reinforcement learning algorithm for the Pick-and-Place task, which is one of the high-level tasks of robot manipulators. The proposed method decomposes the Pick-and-Place task into three subtasks: two reaching tasks and one grasping task. One of the two reaching tasks is approaching the object, and the other is reaching the place position. These two reaching tasks are carried out using each optimal policy of the agents which are trained using Soft Actor-Critic (SAC). Different from the two reaching tasks, the grasping is implemented via simple logic which is easily designable but may result in improper gripping. To assist the grasping task properly, a dedicated reward system for approaching the object is designed through using individual axis-based weights. To verify the validity of the proposed method, wecarry out various experiments in the MuJoCo physics engine with the Robosuite framework. According to the simulation results of four trials, the robot manipulator picked up and released the object in the goal position with an average success rate of 93.2%. |
first_indexed | 2024-03-11T02:42:24Z |
format | Article |
id | doaj.art-0669baea245e4893909f2e5ea9db1cb3 |
institution | Directory Open Access Journal |
issn | 2313-7673 |
language | English |
last_indexed | 2024-03-11T02:42:24Z |
publishDate | 2023-06-01 |
publisher | MDPI AG |
record_format | Article |
series | Biomimetics |
spelling | doaj.art-0669baea245e4893909f2e5ea9db1cb32023-11-18T09:29:41ZengMDPI AGBiomimetics2313-76732023-06-018224010.3390/biomimetics8020240The Task Decomposition and Dedicated Reward-System-Based Reinforcement Learning Algorithm for Pick-and-PlaceByeongjun Kim0Gunam Kwon1Chaneun Park2Nam Kyu Kwon3Department of Electronic Engineering, Yeungnam University, Gyeongsan 38541, Republic of KoreaDepartment of Electronic Engineering, Yeungnam University, Gyeongsan 38541, Republic of KoreaSchool of Electronics Engineering, Kyungpook National University, Daegu 41566, Republic of KoreaDepartment of Electronic Engineering, Yeungnam University, Gyeongsan 38541, Republic of KoreaThis paper proposes a task decomposition and dedicated reward-system-based reinforcement learning algorithm for the Pick-and-Place task, which is one of the high-level tasks of robot manipulators. The proposed method decomposes the Pick-and-Place task into three subtasks: two reaching tasks and one grasping task. One of the two reaching tasks is approaching the object, and the other is reaching the place position. These two reaching tasks are carried out using each optimal policy of the agents which are trained using Soft Actor-Critic (SAC). Different from the two reaching tasks, the grasping is implemented via simple logic which is easily designable but may result in improper gripping. To assist the grasping task properly, a dedicated reward system for approaching the object is designed through using individual axis-based weights. To verify the validity of the proposed method, wecarry out various experiments in the MuJoCo physics engine with the Robosuite framework. According to the simulation results of four trials, the robot manipulator picked up and released the object in the goal position with an average success rate of 93.2%.https://www.mdpi.com/2313-7673/8/2/240deep reinforcement learningSoft Actor-CriticPick-and-Placetask decompositionrobot manipulator |
spellingShingle | Byeongjun Kim Gunam Kwon Chaneun Park Nam Kyu Kwon The Task Decomposition and Dedicated Reward-System-Based Reinforcement Learning Algorithm for Pick-and-Place Biomimetics deep reinforcement learning Soft Actor-Critic Pick-and-Place task decomposition robot manipulator |
title | The Task Decomposition and Dedicated Reward-System-Based Reinforcement Learning Algorithm for Pick-and-Place |
title_full | The Task Decomposition and Dedicated Reward-System-Based Reinforcement Learning Algorithm for Pick-and-Place |
title_fullStr | The Task Decomposition and Dedicated Reward-System-Based Reinforcement Learning Algorithm for Pick-and-Place |
title_full_unstemmed | The Task Decomposition and Dedicated Reward-System-Based Reinforcement Learning Algorithm for Pick-and-Place |
title_short | The Task Decomposition and Dedicated Reward-System-Based Reinforcement Learning Algorithm for Pick-and-Place |
title_sort | task decomposition and dedicated reward system based reinforcement learning algorithm for pick and place |
topic | deep reinforcement learning Soft Actor-Critic Pick-and-Place task decomposition robot manipulator |
url | https://www.mdpi.com/2313-7673/8/2/240 |
work_keys_str_mv | AT byeongjunkim thetaskdecompositionanddedicatedrewardsystembasedreinforcementlearningalgorithmforpickandplace AT gunamkwon thetaskdecompositionanddedicatedrewardsystembasedreinforcementlearningalgorithmforpickandplace AT chaneunpark thetaskdecompositionanddedicatedrewardsystembasedreinforcementlearningalgorithmforpickandplace AT namkyukwon thetaskdecompositionanddedicatedrewardsystembasedreinforcementlearningalgorithmforpickandplace AT byeongjunkim taskdecompositionanddedicatedrewardsystembasedreinforcementlearningalgorithmforpickandplace AT gunamkwon taskdecompositionanddedicatedrewardsystembasedreinforcementlearningalgorithmforpickandplace AT chaneunpark taskdecompositionanddedicatedrewardsystembasedreinforcementlearningalgorithmforpickandplace AT namkyukwon taskdecompositionanddedicatedrewardsystembasedreinforcementlearningalgorithmforpickandplace |