Batch Prioritization in Multigoal Reinforcement Learning

In multigoal reinforcement learning, an agent interacts with an environment and learns to achieve multiple goals. The goal-conditioned policy is trained to effectively generalize its behavior for multiple goals. During training, the experiences collected by the agent are randomly sampled from a repl...

Full description

Bibliographic Details
Main Authors: Luiz Felipe Vecchietti, Taeyoung Kim, Kyujin Choi, Junhee Hong, Dongsoo Har
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9149884/
_version_ 1818323679327551488
author Luiz Felipe Vecchietti
Taeyoung Kim
Kyujin Choi
Junhee Hong
Dongsoo Har
author_facet Luiz Felipe Vecchietti
Taeyoung Kim
Kyujin Choi
Junhee Hong
Dongsoo Har
author_sort Luiz Felipe Vecchietti
collection DOAJ
description In multigoal reinforcement learning, an agent interacts with an environment and learns to achieve multiple goals. The goal-conditioned policy is trained to effectively generalize its behavior for multiple goals. During training, the experiences collected by the agent are randomly sampled from a replay buffer. Because biased sampling of achieved goals affects the success rate of a given task, it should be avoided by considering the valid goal space, introduced here as the set of goals to achieve, and the current competence of the policy. To this end, a novel prioritization method for creation of batches, e.g., collections of samples, is proposed. Candidate batches are sampled and associated with costs; in each iteration the batch with the minimum cost is chosen to train the policy. The cost function is modeled by an intended goal, which is proposed as a hypothetical goal that the policy is trying to learn in each cycle, and the information of the valid goal space. The minimum cost of the batch selected for each iteration decreases throughout training as the policy learns to achieve goals near the center of the valid goal space. The proposed batch prioritization method is combined with hindsight experience replay (HER) for experiments in robotic control tasks presented in the OpenAI gym suite to demonstrate learning performance comparable to that of other state-of-the-art prioritization methods. As a result, the proposed batch prioritization method can achieve improved learning performance in 4 out of 5 tasks, particularly for harder tasks. The experimental results suggest that the proposed method for the creation of training batches, using the valid goal space information and current competence of the policy, can enhance learning performance in multigoal tasks with high-dimensional goal space.
first_indexed 2024-12-13T11:16:31Z
format Article
id doaj.art-734f8921cfda49dfa93dc479046da8c9
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-13T11:16:31Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-734f8921cfda49dfa93dc479046da8c92022-12-21T23:48:36ZengIEEEIEEE Access2169-35362020-01-01813744913746110.1109/ACCESS.2020.30122049149884Batch Prioritization in Multigoal Reinforcement LearningLuiz Felipe Vecchietti0https://orcid.org/0000-0003-2862-6200Taeyoung Kim1https://orcid.org/0000-0002-1384-3459Kyujin Choi2https://orcid.org/0000-0002-6153-6541Junhee Hong3https://orcid.org/0000-0003-1285-1454Dongsoo Har4https://orcid.org/0000-0002-6949-1739Cho Chun Shik Graduate School of Green Transportation, Korea Advanced Institute of Science and Technology, Daejeon, South KoreaCho Chun Shik Graduate School of Green Transportation, Korea Advanced Institute of Science and Technology, Daejeon, South KoreaCho Chun Shik Graduate School of Green Transportation, Korea Advanced Institute of Science and Technology, Daejeon, South KoreaDepartment of Energy IT, Gachon University, Seongnam, South KoreaCho Chun Shik Graduate School of Green Transportation, Korea Advanced Institute of Science and Technology, Daejeon, South KoreaIn multigoal reinforcement learning, an agent interacts with an environment and learns to achieve multiple goals. The goal-conditioned policy is trained to effectively generalize its behavior for multiple goals. During training, the experiences collected by the agent are randomly sampled from a replay buffer. Because biased sampling of achieved goals affects the success rate of a given task, it should be avoided by considering the valid goal space, introduced here as the set of goals to achieve, and the current competence of the policy. To this end, a novel prioritization method for creation of batches, e.g., collections of samples, is proposed. Candidate batches are sampled and associated with costs; in each iteration the batch with the minimum cost is chosen to train the policy. The cost function is modeled by an intended goal, which is proposed as a hypothetical goal that the policy is trying to learn in each cycle, and the information of the valid goal space. The minimum cost of the batch selected for each iteration decreases throughout training as the policy learns to achieve goals near the center of the valid goal space. The proposed batch prioritization method is combined with hindsight experience replay (HER) for experiments in robotic control tasks presented in the OpenAI gym suite to demonstrate learning performance comparable to that of other state-of-the-art prioritization methods. As a result, the proposed batch prioritization method can achieve improved learning performance in 4 out of 5 tasks, particularly for harder tasks. The experimental results suggest that the proposed method for the creation of training batches, using the valid goal space information and current competence of the policy, can enhance learning performance in multigoal tasks with high-dimensional goal space.https://ieeexplore.ieee.org/document/9149884/Experience replaybatch prioritizationgoal distributionreinforcement learningintended goal
spellingShingle Luiz Felipe Vecchietti
Taeyoung Kim
Kyujin Choi
Junhee Hong
Dongsoo Har
Batch Prioritization in Multigoal Reinforcement Learning
IEEE Access
Experience replay
batch prioritization
goal distribution
reinforcement learning
intended goal
title Batch Prioritization in Multigoal Reinforcement Learning
title_full Batch Prioritization in Multigoal Reinforcement Learning
title_fullStr Batch Prioritization in Multigoal Reinforcement Learning
title_full_unstemmed Batch Prioritization in Multigoal Reinforcement Learning
title_short Batch Prioritization in Multigoal Reinforcement Learning
title_sort batch prioritization in multigoal reinforcement learning
topic Experience replay
batch prioritization
goal distribution
reinforcement learning
intended goal
url https://ieeexplore.ieee.org/document/9149884/
work_keys_str_mv AT luizfelipevecchietti batchprioritizationinmultigoalreinforcementlearning
AT taeyoungkim batchprioritizationinmultigoalreinforcementlearning
AT kyujinchoi batchprioritizationinmultigoalreinforcementlearning
AT junheehong batchprioritizationinmultigoalreinforcementlearning
AT dongsoohar batchprioritizationinmultigoalreinforcementlearning