Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions

Over the last decade, methods for autonomous control by artificial intelligence have been extensively developed based on deep reinforcement learning (DRL) technologies. However, despite these advances, robustness to noise in observation data remains as an issue in autonomous control policies impleme...

Full description

Bibliographic Details
Main Authors: Kohei Ohashi, Kosuke Nakanishi, Yuji Yasui, Shin Ishii
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10250423/
_version_ 1797676722305368064
author Kohei Ohashi
Kosuke Nakanishi
Yuji Yasui
Shin Ishii
author_facet Kohei Ohashi
Kosuke Nakanishi
Yuji Yasui
Shin Ishii
author_sort Kohei Ohashi
collection DOAJ
description Over the last decade, methods for autonomous control by artificial intelligence have been extensively developed based on deep reinforcement learning (DRL) technologies. However, despite these advances, robustness to noise in observation data remains as an issue in autonomous control policies implemented using DRL in practical applications. In this study, we present a general robust adversarial learning technology applicable to DRL. During these adversarial learning processes, policies are trained to output consistent control actions through regularization learning, even for adversarial input examples. Importantly, these adversarial examples are produced to lead the current policy to predict the worst action at each state. Although a naive implementation of regularization learning may cause DRL model to learn a biased objective function, our methods were found to minimize bias. When implemented as a modification of a deep Q-network for discrete-action problems in Atari 2600 games and of a deep deterministic policy gradient for continuous-action tasks in Pybullet, our new adversarial learning frameworks showed significantly enhanced robustness against adversarial and random noise added to the input compared to several recently proposed methods.
first_indexed 2024-03-11T22:34:14Z
format Article
id doaj.art-f646c54671da448f9ed62a431b72a8f7
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-11T22:34:14Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-f646c54671da448f9ed62a431b72a8f72023-09-22T23:01:17ZengIEEEIEEE Access2169-35362023-01-011110079810080910.1109/ACCESS.2023.331475010250423Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value PredictionsKohei Ohashi0Kosuke Nakanishi1https://orcid.org/0000-0002-0078-6942Yuji Yasui2Shin Ishii3Department of Systems Science, Graduate School of Informatics, Kyoto University, Kyoto, JapanDepartment of Systems Science, Graduate School of Informatics, Kyoto University, Kyoto, JapanHonda Research and Development Company Ltd., Saitama, JapanDepartment of Systems Science, Graduate School of Informatics, Kyoto University, Kyoto, JapanOver the last decade, methods for autonomous control by artificial intelligence have been extensively developed based on deep reinforcement learning (DRL) technologies. However, despite these advances, robustness to noise in observation data remains as an issue in autonomous control policies implemented using DRL in practical applications. In this study, we present a general robust adversarial learning technology applicable to DRL. During these adversarial learning processes, policies are trained to output consistent control actions through regularization learning, even for adversarial input examples. Importantly, these adversarial examples are produced to lead the current policy to predict the worst action at each state. Although a naive implementation of regularization learning may cause DRL model to learn a biased objective function, our methods were found to minimize bias. When implemented as a modification of a deep Q-network for discrete-action problems in Atari 2600 games and of a deep deterministic policy gradient for continuous-action tasks in Pybullet, our new adversarial learning frameworks showed significantly enhanced robustness against adversarial and random noise added to the input compared to several recently proposed methods.https://ieeexplore.ieee.org/document/10250423/Deep reinforcement learningadversarial examplerobustnessregularization
spellingShingle Kohei Ohashi
Kosuke Nakanishi
Yuji Yasui
Shin Ishii
Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions
IEEE Access
Deep reinforcement learning
adversarial example
robustness
regularization
title Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions
title_full Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions
title_fullStr Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions
title_full_unstemmed Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions
title_short Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions
title_sort deep adversarial reinforcement learning method to generate control policies robust against worst case value predictions
topic Deep reinforcement learning
adversarial example
robustness
regularization
url https://ieeexplore.ieee.org/document/10250423/
work_keys_str_mv AT koheiohashi deepadversarialreinforcementlearningmethodtogeneratecontrolpoliciesrobustagainstworstcasevaluepredictions
AT kosukenakanishi deepadversarialreinforcementlearningmethodtogeneratecontrolpoliciesrobustagainstworstcasevaluepredictions
AT yujiyasui deepadversarialreinforcementlearningmethodtogeneratecontrolpoliciesrobustagainstworstcasevaluepredictions
AT shinishii deepadversarialreinforcementlearningmethodtogeneratecontrolpoliciesrobustagainstworstcasevaluepredictions