Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control
In multi-goal reinforcement learning, an agent learns to achieve multiple goals using a goal-oriented policy, obtaining rewards from positions that have been achieved. Dynamic hindsight experience replay method improves the learning efficiency of the algorithm by matching the trajectories of past fa...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Tamkang University Press
2023-08-01
|
Series: | Journal of Applied Science and Engineering |
Subjects: | |
Online Access: | http://jase.tku.edu.tw/articles/jase-202312-26-12-0015 |
_version_ | 1797740568538775552 |
---|---|
author | Xueyu Wei Lilong Duan Wei Xue |
author_facet | Xueyu Wei Lilong Duan Wei Xue |
author_sort | Xueyu Wei |
collection | DOAJ |
description | In multi-goal reinforcement learning, an agent learns to achieve multiple goals using a goal-oriented policy, obtaining rewards from positions that have been achieved. Dynamic hindsight experience replay method improves the learning efficiency of the algorithm by matching the trajectories of past failed episodes and creating successful experiences. But these experiences are sampled and replayed by a random strategy, without considering the importance of the episode samples for learning. Therefore, not only bias is introduced as the training process, but also suboptimal improvements in terms of sample efficiency are obtained. To address these issues, this paper introduces a reward-weighted mechanism based on the dynamic hindsight experience replay (RDHER). We extend dynamic hindsight experience replay with a trade-off to make rewards calculated for hindsight experience numerically greater than actual rewards. Specifically, the hindsight rewards are multiplied by a weighting factor to increase the Q-value of the hindsight state–action pair, which drives the update of the policy to select the maximum action for the given hindsight transitions. Our experiments show that the hindsight bias can be reduced in training using the proposed method. Further, we demonstrate RDHER is effective in challenging robot manipulation tasks, and outperforms several other multi-goal baseline methods in terms of success rate. |
first_indexed | 2024-03-12T14:14:16Z |
format | Article |
id | doaj.art-244753531ddf4da9b9895091ac7ed7f2 |
institution | Directory Open Access Journal |
issn | 2708-9967 2708-9975 |
language | English |
last_indexed | 2024-03-12T14:14:16Z |
publishDate | 2023-08-01 |
publisher | Tamkang University Press |
record_format | Article |
series | Journal of Applied Science and Engineering |
spelling | doaj.art-244753531ddf4da9b9895091ac7ed7f22023-08-20T18:46:32ZengTamkang University PressJournal of Applied Science and Engineering2708-99672708-99752023-08-0126121829184110.6180/jase.202312_26(12).0015Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation ControlXueyu Wei0Lilong Duan1Wei Xue2School of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, ChinaSchool of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, ChinaSchool of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, ChinaIn multi-goal reinforcement learning, an agent learns to achieve multiple goals using a goal-oriented policy, obtaining rewards from positions that have been achieved. Dynamic hindsight experience replay method improves the learning efficiency of the algorithm by matching the trajectories of past failed episodes and creating successful experiences. But these experiences are sampled and replayed by a random strategy, without considering the importance of the episode samples for learning. Therefore, not only bias is introduced as the training process, but also suboptimal improvements in terms of sample efficiency are obtained. To address these issues, this paper introduces a reward-weighted mechanism based on the dynamic hindsight experience replay (RDHER). We extend dynamic hindsight experience replay with a trade-off to make rewards calculated for hindsight experience numerically greater than actual rewards. Specifically, the hindsight rewards are multiplied by a weighting factor to increase the Q-value of the hindsight state–action pair, which drives the update of the policy to select the maximum action for the given hindsight transitions. Our experiments show that the hindsight bias can be reduced in training using the proposed method. Further, we demonstrate RDHER is effective in challenging robot manipulation tasks, and outperforms several other multi-goal baseline methods in terms of success rate.http://jase.tku.edu.tw/articles/jase-202312-26-12-0015reinforcement learningmulti-goal learninghindsight experience replayhindsight biasreward-weighted |
spellingShingle | Xueyu Wei Lilong Duan Wei Xue Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control Journal of Applied Science and Engineering reinforcement learning multi-goal learning hindsight experience replay hindsight bias reward-weighted |
title | Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control |
title_full | Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control |
title_fullStr | Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control |
title_full_unstemmed | Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control |
title_short | Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control |
title_sort | reward weighted dher mechanism for multi goal reinforcement learning with application to robotic manipulation control |
topic | reinforcement learning multi-goal learning hindsight experience replay hindsight bias reward-weighted |
url | http://jase.tku.edu.tw/articles/jase-202312-26-12-0015 |
work_keys_str_mv | AT xueyuwei rewardweighteddhermechanismformultigoalreinforcementlearningwithapplicationtoroboticmanipulationcontrol AT lilongduan rewardweighteddhermechanismformultigoalreinforcementlearningwithapplicationtoroboticmanipulationcontrol AT weixue rewardweighteddhermechanismformultigoalreinforcementlearningwithapplicationtoroboticmanipulationcontrol |