Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions

Over the last decade, methods for autonomous control by artificial intelligence have been extensively developed based on deep reinforcement learning (DRL) technologies. However, despite these advances, robustness to noise in observation data remains as an issue in autonomous control policies impleme...

Full description

Bibliographic Details
Main Authors:	Kohei Ohashi, Kosuke Nakanishi, Yuji Yasui, Shin Ishii
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Deep reinforcement learning adversarial example robustness regularization
Online Access:	https://ieeexplore.ieee.org/document/10250423/

_version_	1797676722305368064
author	Kohei Ohashi Kosuke Nakanishi Yuji Yasui Shin Ishii
author_facet	Kohei Ohashi Kosuke Nakanishi Yuji Yasui Shin Ishii
author_sort	Kohei Ohashi
collection	DOAJ
description	Over the last decade, methods for autonomous control by artificial intelligence have been extensively developed based on deep reinforcement learning (DRL) technologies. However, despite these advances, robustness to noise in observation data remains as an issue in autonomous control policies implemented using DRL in practical applications. In this study, we present a general robust adversarial learning technology applicable to DRL. During these adversarial learning processes, policies are trained to output consistent control actions through regularization learning, even for adversarial input examples. Importantly, these adversarial examples are produced to lead the current policy to predict the worst action at each state. Although a naive implementation of regularization learning may cause DRL model to learn a biased objective function, our methods were found to minimize bias. When implemented as a modification of a deep Q-network for discrete-action problems in Atari 2600 games and of a deep deterministic policy gradient for continuous-action tasks in Pybullet, our new adversarial learning frameworks showed significantly enhanced robustness against adversarial and random noise added to the input compared to several recently proposed methods.
first_indexed	2024-03-11T22:34:14Z
format	Article
id	doaj.art-f646c54671da448f9ed62a431b72a8f7
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-11T22:34:14Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-f646c54671da448f9ed62a431b72a8f72023-09-22T23:01:17ZengIEEEIEEE Access2169-35362023-01-011110079810080910.1109/ACCESS.2023.331475010250423Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value PredictionsKohei Ohashi0Kosuke Nakanishi1https://orcid.org/0000-0002-0078-6942Yuji Yasui2Shin Ishii3Department of Systems Science, Graduate School of Informatics, Kyoto University, Kyoto, JapanDepartment of Systems Science, Graduate School of Informatics, Kyoto University, Kyoto, JapanHonda Research and Development Company Ltd., Saitama, JapanDepartment of Systems Science, Graduate School of Informatics, Kyoto University, Kyoto, JapanOver the last decade, methods for autonomous control by artificial intelligence have been extensively developed based on deep reinforcement learning (DRL) technologies. However, despite these advances, robustness to noise in observation data remains as an issue in autonomous control policies implemented using DRL in practical applications. In this study, we present a general robust adversarial learning technology applicable to DRL. During these adversarial learning processes, policies are trained to output consistent control actions through regularization learning, even for adversarial input examples. Importantly, these adversarial examples are produced to lead the current policy to predict the worst action at each state. Although a naive implementation of regularization learning may cause DRL model to learn a biased objective function, our methods were found to minimize bias. When implemented as a modification of a deep Q-network for discrete-action problems in Atari 2600 games and of a deep deterministic policy gradient for continuous-action tasks in Pybullet, our new adversarial learning frameworks showed significantly enhanced robustness against adversarial and random noise added to the input compared to several recently proposed methods.https://ieeexplore.ieee.org/document/10250423/Deep reinforcement learningadversarial examplerobustnessregularization
spellingShingle	Kohei Ohashi Kosuke Nakanishi Yuji Yasui Shin Ishii Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions IEEE Access Deep reinforcement learning adversarial example robustness regularization
title	Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions
title_full	Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions
title_fullStr	Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions
title_full_unstemmed	Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions
title_short	Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions
title_sort	deep adversarial reinforcement learning method to generate control policies robust against worst case value predictions
topic	Deep reinforcement learning adversarial example robustness regularization
url	https://ieeexplore.ieee.org/document/10250423/
work_keys_str_mv	AT koheiohashi deepadversarialreinforcementlearningmethodtogeneratecontrolpoliciesrobustagainstworstcasevaluepredictions AT kosukenakanishi deepadversarialreinforcementlearningmethodtogeneratecontrolpoliciesrobustagainstworstcasevaluepredictions AT yujiyasui deepadversarialreinforcementlearningmethodtogeneratecontrolpoliciesrobustagainstworstcasevaluepredictions AT shinishii deepadversarialreinforcementlearningmethodtogeneratecontrolpoliciesrobustagainstworstcasevaluepredictions

Deep Adversarial Reinforcement Learning Method to Generate Control Policies Robust Against Worst-Case Value Predictions

Similar Items