Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control

In multi-goal reinforcement learning, an agent learns to achieve multiple goals using a goal-oriented policy, obtaining rewards from positions that have been achieved. Dynamic hindsight experience replay method improves the learning efficiency of the algorithm by matching the trajectories of past fa...

Full description

Bibliographic Details
Main Authors:	Xueyu Wei, Lilong Duan, Wei Xue
Format:	Article
Language:	English
Published:	Tamkang University Press 2023-08-01
Series:	Journal of Applied Science and Engineering
Subjects:	reinforcement learning multi-goal learning hindsight experience replay hindsight bias reward-weighted
Online Access:	http://jase.tku.edu.tw/articles/jase-202312-26-12-0015

_version_	1797740568538775552
author	Xueyu Wei Lilong Duan Wei Xue
author_facet	Xueyu Wei Lilong Duan Wei Xue
author_sort	Xueyu Wei
collection	DOAJ
description	In multi-goal reinforcement learning, an agent learns to achieve multiple goals using a goal-oriented policy, obtaining rewards from positions that have been achieved. Dynamic hindsight experience replay method improves the learning efficiency of the algorithm by matching the trajectories of past failed episodes and creating successful experiences. But these experiences are sampled and replayed by a random strategy, without considering the importance of the episode samples for learning. Therefore, not only bias is introduced as the training process, but also suboptimal improvements in terms of sample efficiency are obtained. To address these issues, this paper introduces a reward-weighted mechanism based on the dynamic hindsight experience replay (RDHER). We extend dynamic hindsight experience replay with a trade-off to make rewards calculated for hindsight experience numerically greater than actual rewards. Specifically, the hindsight rewards are multiplied by a weighting factor to increase the Q-value of the hindsight state–action pair, which drives the update of the policy to select the maximum action for the given hindsight transitions. Our experiments show that the hindsight bias can be reduced in training using the proposed method. Further, we demonstrate RDHER is effective in challenging robot manipulation tasks, and outperforms several other multi-goal baseline methods in terms of success rate.
first_indexed	2024-03-12T14:14:16Z
format	Article
id	doaj.art-244753531ddf4da9b9895091ac7ed7f2
institution	Directory Open Access Journal
issn	2708-9967 2708-9975
language	English
last_indexed	2024-03-12T14:14:16Z
publishDate	2023-08-01
publisher	Tamkang University Press
record_format	Article
series	Journal of Applied Science and Engineering
spelling	doaj.art-244753531ddf4da9b9895091ac7ed7f22023-08-20T18:46:32ZengTamkang University PressJournal of Applied Science and Engineering2708-99672708-99752023-08-0126121829184110.6180/jase.202312_26(12).0015Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation ControlXueyu Wei0Lilong Duan1Wei Xue2School of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, ChinaSchool of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, ChinaSchool of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, ChinaIn multi-goal reinforcement learning, an agent learns to achieve multiple goals using a goal-oriented policy, obtaining rewards from positions that have been achieved. Dynamic hindsight experience replay method improves the learning efficiency of the algorithm by matching the trajectories of past failed episodes and creating successful experiences. But these experiences are sampled and replayed by a random strategy, without considering the importance of the episode samples for learning. Therefore, not only bias is introduced as the training process, but also suboptimal improvements in terms of sample efficiency are obtained. To address these issues, this paper introduces a reward-weighted mechanism based on the dynamic hindsight experience replay (RDHER). We extend dynamic hindsight experience replay with a trade-off to make rewards calculated for hindsight experience numerically greater than actual rewards. Specifically, the hindsight rewards are multiplied by a weighting factor to increase the Q-value of the hindsight state–action pair, which drives the update of the policy to select the maximum action for the given hindsight transitions. Our experiments show that the hindsight bias can be reduced in training using the proposed method. Further, we demonstrate RDHER is effective in challenging robot manipulation tasks, and outperforms several other multi-goal baseline methods in terms of success rate.http://jase.tku.edu.tw/articles/jase-202312-26-12-0015reinforcement learningmulti-goal learninghindsight experience replayhindsight biasreward-weighted
spellingShingle	Xueyu Wei Lilong Duan Wei Xue Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control Journal of Applied Science and Engineering reinforcement learning multi-goal learning hindsight experience replay hindsight bias reward-weighted
title	Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control
title_full	Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control
title_fullStr	Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control
title_full_unstemmed	Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control
title_short	Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control
title_sort	reward weighted dher mechanism for multi goal reinforcement learning with application to robotic manipulation control
topic	reinforcement learning multi-goal learning hindsight experience replay hindsight bias reward-weighted
url	http://jase.tku.edu.tw/articles/jase-202312-26-12-0015
work_keys_str_mv	AT xueyuwei rewardweighteddhermechanismformultigoalreinforcementlearningwithapplicationtoroboticmanipulationcontrol AT lilongduan rewardweighteddhermechanismformultigoalreinforcementlearningwithapplicationtoroboticmanipulationcontrol AT weixue rewardweighteddhermechanismformultigoalreinforcementlearningwithapplicationtoroboticmanipulationcontrol

Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control

Similar Items