Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control

In multi-goal reinforcement learning, an agent learns to achieve multiple goals using a goal-oriented policy, obtaining rewards from positions that have been achieved. Dynamic hindsight experience replay method improves the learning efficiency of the algorithm by matching the trajectories of past fa...

Full description

Bibliographic Details
Main Authors: Xueyu Wei, Lilong Duan, Wei Xue
Format: Article
Language:English
Published: Tamkang University Press 2023-08-01
Series:Journal of Applied Science and Engineering
Subjects:
Online Access:http://jase.tku.edu.tw/articles/jase-202312-26-12-0015
_version_ 1797740568538775552
author Xueyu Wei
Lilong Duan
Wei Xue
author_facet Xueyu Wei
Lilong Duan
Wei Xue
author_sort Xueyu Wei
collection DOAJ
description In multi-goal reinforcement learning, an agent learns to achieve multiple goals using a goal-oriented policy, obtaining rewards from positions that have been achieved. Dynamic hindsight experience replay method improves the learning efficiency of the algorithm by matching the trajectories of past failed episodes and creating successful experiences. But these experiences are sampled and replayed by a random strategy, without considering the importance of the episode samples for learning. Therefore, not only bias is introduced as the training process, but also suboptimal improvements in terms of sample efficiency are obtained. To address these issues, this paper introduces a reward-weighted mechanism based on the dynamic hindsight experience replay (RDHER). We extend dynamic hindsight experience replay with a trade-off to make rewards calculated for hindsight experience numerically greater than actual rewards. Specifically, the hindsight rewards are multiplied by a weighting factor to increase the Q-value of the hindsight state–action pair, which drives the update of the policy to select the maximum action for the given hindsight transitions. Our experiments show that the hindsight bias can be reduced in training using the proposed method. Further, we demonstrate RDHER is effective in challenging robot manipulation tasks, and outperforms several other multi-goal baseline methods in terms of success rate.
first_indexed 2024-03-12T14:14:16Z
format Article
id doaj.art-244753531ddf4da9b9895091ac7ed7f2
institution Directory Open Access Journal
issn 2708-9967
2708-9975
language English
last_indexed 2024-03-12T14:14:16Z
publishDate 2023-08-01
publisher Tamkang University Press
record_format Article
series Journal of Applied Science and Engineering
spelling doaj.art-244753531ddf4da9b9895091ac7ed7f22023-08-20T18:46:32ZengTamkang University PressJournal of Applied Science and Engineering2708-99672708-99752023-08-0126121829184110.6180/jase.202312_26(12).0015Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation ControlXueyu Wei0Lilong Duan1Wei Xue2School of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, ChinaSchool of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, ChinaSchool of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, ChinaIn multi-goal reinforcement learning, an agent learns to achieve multiple goals using a goal-oriented policy, obtaining rewards from positions that have been achieved. Dynamic hindsight experience replay method improves the learning efficiency of the algorithm by matching the trajectories of past failed episodes and creating successful experiences. But these experiences are sampled and replayed by a random strategy, without considering the importance of the episode samples for learning. Therefore, not only bias is introduced as the training process, but also suboptimal improvements in terms of sample efficiency are obtained. To address these issues, this paper introduces a reward-weighted mechanism based on the dynamic hindsight experience replay (RDHER). We extend dynamic hindsight experience replay with a trade-off to make rewards calculated for hindsight experience numerically greater than actual rewards. Specifically, the hindsight rewards are multiplied by a weighting factor to increase the Q-value of the hindsight state–action pair, which drives the update of the policy to select the maximum action for the given hindsight transitions. Our experiments show that the hindsight bias can be reduced in training using the proposed method. Further, we demonstrate RDHER is effective in challenging robot manipulation tasks, and outperforms several other multi-goal baseline methods in terms of success rate.http://jase.tku.edu.tw/articles/jase-202312-26-12-0015reinforcement learningmulti-goal learninghindsight experience replayhindsight biasreward-weighted
spellingShingle Xueyu Wei
Lilong Duan
Wei Xue
Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control
Journal of Applied Science and Engineering
reinforcement learning
multi-goal learning
hindsight experience replay
hindsight bias
reward-weighted
title Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control
title_full Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control
title_fullStr Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control
title_full_unstemmed Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control
title_short Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control
title_sort reward weighted dher mechanism for multi goal reinforcement learning with application to robotic manipulation control
topic reinforcement learning
multi-goal learning
hindsight experience replay
hindsight bias
reward-weighted
url http://jase.tku.edu.tw/articles/jase-202312-26-12-0015
work_keys_str_mv AT xueyuwei rewardweighteddhermechanismformultigoalreinforcementlearningwithapplicationtoroboticmanipulationcontrol
AT lilongduan rewardweighteddhermechanismformultigoalreinforcementlearningwithapplicationtoroboticmanipulationcontrol
AT weixue rewardweighteddhermechanismformultigoalreinforcementlearningwithapplicationtoroboticmanipulationcontrol