Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification

Deep neural networks (DNNs) have been known to be vulnerable to adversarial attacks. Adversarial training (AT) is, so far, the only method that can guarantee the robustness of DNNs to adversarial attacks. However, the robustness generalization accuracy gain of AT is still far lower than the standard...

Full description

Bibliographic Details
Main Authors: Desheng Wang, Weidong Jin, Yunpu Wu
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/23/6/3252
_version_ 1827747626254073856
author Desheng Wang
Weidong Jin
Yunpu Wu
author_facet Desheng Wang
Weidong Jin
Yunpu Wu
author_sort Desheng Wang
collection DOAJ
description Deep neural networks (DNNs) have been known to be vulnerable to adversarial attacks. Adversarial training (AT) is, so far, the only method that can guarantee the robustness of DNNs to adversarial attacks. However, the robustness generalization accuracy gain of AT is still far lower than the standard generalization accuracy of an undefended model, and there is known to be a trade-off between the standard generalization accuracy and the robustness generalization accuracy of an adversarially trained model. In order to improve the robustness generalization and the standard generalization performance trade-off of AT, we propose a novel defense algorithm called Between-Class Adversarial Training (BCAT) that combines Between-Class learning (BC-learning) with standard AT. Specifically, BCAT mixes two adversarial examples from different classes and uses the mixed between-class adversarial examples to train a model instead of original adversarial examples during AT. We further propose BCAT+ which adopts a more powerful mixing method. BCAT and BCAT+ impose effective regularization on the feature distribution of adversarial examples to enlarge between-class distance, thus improving the robustness generalization and the standard generalization performance of AT. The proposed algorithms do not introduce any hyperparameters into standard AT; therefore, the process of hyperparameters searching can be avoided. We evaluate the proposed algorithms under both white-box attacks and black-box attacks using a spectrum of perturbation values on CIFAR-10, CIFAR-100, and SVHN datasets. The research findings indicate that our algorithms achieve better global robustness generalization performance than the state-of-the-art adversarial defense methods.
first_indexed 2024-03-11T05:55:26Z
format Article
id doaj.art-86558422308a46baa95f5f3f2f56691e
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-11T05:55:26Z
publishDate 2023-03-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-86558422308a46baa95f5f3f2f56691e2023-11-17T13:48:05ZengMDPI AGSensors1424-82202023-03-01236325210.3390/s23063252Between-Class Adversarial Training for Improving Adversarial Robustness of Image ClassificationDesheng Wang0Weidong Jin1Yunpu Wu2School of Electrical Engineering, Southwest Jiaotong University, Chengdu 611756, ChinaSchool of Electrical Engineering, Southwest Jiaotong University, Chengdu 611756, ChinaSchool of Electrical Engineering and Electronic Information, Xihua University, Chengdu 610039, ChinaDeep neural networks (DNNs) have been known to be vulnerable to adversarial attacks. Adversarial training (AT) is, so far, the only method that can guarantee the robustness of DNNs to adversarial attacks. However, the robustness generalization accuracy gain of AT is still far lower than the standard generalization accuracy of an undefended model, and there is known to be a trade-off between the standard generalization accuracy and the robustness generalization accuracy of an adversarially trained model. In order to improve the robustness generalization and the standard generalization performance trade-off of AT, we propose a novel defense algorithm called Between-Class Adversarial Training (BCAT) that combines Between-Class learning (BC-learning) with standard AT. Specifically, BCAT mixes two adversarial examples from different classes and uses the mixed between-class adversarial examples to train a model instead of original adversarial examples during AT. We further propose BCAT+ which adopts a more powerful mixing method. BCAT and BCAT+ impose effective regularization on the feature distribution of adversarial examples to enlarge between-class distance, thus improving the robustness generalization and the standard generalization performance of AT. The proposed algorithms do not introduce any hyperparameters into standard AT; therefore, the process of hyperparameters searching can be avoided. We evaluate the proposed algorithms under both white-box attacks and black-box attacks using a spectrum of perturbation values on CIFAR-10, CIFAR-100, and SVHN datasets. The research findings indicate that our algorithms achieve better global robustness generalization performance than the state-of-the-art adversarial defense methods.https://www.mdpi.com/1424-8220/23/6/3252adversarial trainingbetween-class learningrobustnessregularization
spellingShingle Desheng Wang
Weidong Jin
Yunpu Wu
Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification
Sensors
adversarial training
between-class learning
robustness
regularization
title Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification
title_full Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification
title_fullStr Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification
title_full_unstemmed Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification
title_short Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification
title_sort between class adversarial training for improving adversarial robustness of image classification
topic adversarial training
between-class learning
robustness
regularization
url https://www.mdpi.com/1424-8220/23/6/3252
work_keys_str_mv AT deshengwang betweenclassadversarialtrainingforimprovingadversarialrobustnessofimageclassification
AT weidongjin betweenclassadversarialtrainingforimprovingadversarialrobustnessofimageclassification
AT yunpuwu betweenclassadversarialtrainingforimprovingadversarialrobustnessofimageclassification