Combined Constraint on Behavior Cloning and Discriminator in Offline Reinforcement Learning

In recent years, reinforcement learning (RL) has received a lot of attention because we can automatically learn optimal behavioral policies. However, since RL acquires the policy by repeatedly interacting with the environment, it is difficult to learn about realistic tasks. In recent years, there ha...

Full description

Bibliographic Details
Main Authors: Shunya Kidera, Kosuke Shintani, Toi Tsuneda, Satoshi Yamane
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10418100/
_version_ 1797319554989293568
author Shunya Kidera
Kosuke Shintani
Toi Tsuneda
Satoshi Yamane
author_facet Shunya Kidera
Kosuke Shintani
Toi Tsuneda
Satoshi Yamane
author_sort Shunya Kidera
collection DOAJ
description In recent years, reinforcement learning (RL) has received a lot of attention because we can automatically learn optimal behavioral policies. However, since RL acquires the policy by repeatedly interacting with the environment, it is difficult to learn about realistic tasks. In recent years, there has been a lot of research on offline RL (batch RL), which does not need to interact with the environment, but learns from the accumulated experience prepared in advance. Learning does not work by applying common RL methods directly to offline RL because of a problem called distributional shift. Methods to suppress distributional shift have been actively studied in offline RL. In this study, we propose a new offline RL algorithm that adds constraints from discriminators used in Generative Adversarial Networks to the offline RL method called TD3+BC. We compare and validate the proposed method with existing methods using a benchmark for 3D robot control simulation. In TD3+BC, the constraint was tightened to suppress distribution shift, but a challenge arose when the quality of the dataset was poor, leading to difficulties in successful learning. The proposed approach addresses this issue by incorporating features to mitigate distribution shift while introducing new constraints to enable learning that is not solely dependent on the dataset’s quality. This innovative strategy aims to improve accuracy even in cases where the dataset exhibits poor characteristics.
first_indexed 2024-03-08T04:08:41Z
format Article
id doaj.art-76a2d0b3cd5f4e4db06a44e5c86f7273
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-08T04:08:41Z
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-76a2d0b3cd5f4e4db06a44e5c86f72732024-02-09T00:02:40ZengIEEEIEEE Access2169-35362024-01-0112199421995110.1109/ACCESS.2024.336103010418100Combined Constraint on Behavior Cloning and Discriminator in Offline Reinforcement LearningShunya Kidera0https://orcid.org/0009-0009-7049-5882Kosuke Shintani1Toi Tsuneda2https://orcid.org/0000-0003-1913-9179Satoshi Yamane3https://orcid.org/0000-0001-7883-4054Electrical Engineering Department, Kanazawa University, Kanazawa, JapanElectrical Engineering Department, Kanazawa University, Kanazawa, JapanElectrical Engineering Department, Kanazawa University, Kanazawa, JapanElectrical Engineering Department, Kanazawa University, Kanazawa, JapanIn recent years, reinforcement learning (RL) has received a lot of attention because we can automatically learn optimal behavioral policies. However, since RL acquires the policy by repeatedly interacting with the environment, it is difficult to learn about realistic tasks. In recent years, there has been a lot of research on offline RL (batch RL), which does not need to interact with the environment, but learns from the accumulated experience prepared in advance. Learning does not work by applying common RL methods directly to offline RL because of a problem called distributional shift. Methods to suppress distributional shift have been actively studied in offline RL. In this study, we propose a new offline RL algorithm that adds constraints from discriminators used in Generative Adversarial Networks to the offline RL method called TD3+BC. We compare and validate the proposed method with existing methods using a benchmark for 3D robot control simulation. In TD3+BC, the constraint was tightened to suppress distribution shift, but a challenge arose when the quality of the dataset was poor, leading to difficulties in successful learning. The proposed approach addresses this issue by incorporating features to mitigate distribution shift while introducing new constraints to enable learning that is not solely dependent on the dataset’s quality. This innovative strategy aims to improve accuracy even in cases where the dataset exhibits poor characteristics.https://ieeexplore.ieee.org/document/10418100/Reinforcement learningoffline reinforcement learninggenerative adversarial networksdiscriminatorrobot control
spellingShingle Shunya Kidera
Kosuke Shintani
Toi Tsuneda
Satoshi Yamane
Combined Constraint on Behavior Cloning and Discriminator in Offline Reinforcement Learning
IEEE Access
Reinforcement learning
offline reinforcement learning
generative adversarial networks
discriminator
robot control
title Combined Constraint on Behavior Cloning and Discriminator in Offline Reinforcement Learning
title_full Combined Constraint on Behavior Cloning and Discriminator in Offline Reinforcement Learning
title_fullStr Combined Constraint on Behavior Cloning and Discriminator in Offline Reinforcement Learning
title_full_unstemmed Combined Constraint on Behavior Cloning and Discriminator in Offline Reinforcement Learning
title_short Combined Constraint on Behavior Cloning and Discriminator in Offline Reinforcement Learning
title_sort combined constraint on behavior cloning and discriminator in offline reinforcement learning
topic Reinforcement learning
offline reinforcement learning
generative adversarial networks
discriminator
robot control
url https://ieeexplore.ieee.org/document/10418100/
work_keys_str_mv AT shunyakidera combinedconstraintonbehaviorcloninganddiscriminatorinofflinereinforcementlearning
AT kosukeshintani combinedconstraintonbehaviorcloninganddiscriminatorinofflinereinforcementlearning
AT toitsuneda combinedconstraintonbehaviorcloninganddiscriminatorinofflinereinforcementlearning
AT satoshiyamane combinedconstraintonbehaviorcloninganddiscriminatorinofflinereinforcementlearning