Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification
Deep neural networks (DNNs) have been known to be vulnerable to adversarial attacks. Adversarial training (AT) is, so far, the only method that can guarantee the robustness of DNNs to adversarial attacks. However, the robustness generalization accuracy gain of AT is still far lower than the standard...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-03-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/23/6/3252 |
_version_ | 1827747626254073856 |
---|---|
author | Desheng Wang Weidong Jin Yunpu Wu |
author_facet | Desheng Wang Weidong Jin Yunpu Wu |
author_sort | Desheng Wang |
collection | DOAJ |
description | Deep neural networks (DNNs) have been known to be vulnerable to adversarial attacks. Adversarial training (AT) is, so far, the only method that can guarantee the robustness of DNNs to adversarial attacks. However, the robustness generalization accuracy gain of AT is still far lower than the standard generalization accuracy of an undefended model, and there is known to be a trade-off between the standard generalization accuracy and the robustness generalization accuracy of an adversarially trained model. In order to improve the robustness generalization and the standard generalization performance trade-off of AT, we propose a novel defense algorithm called Between-Class Adversarial Training (BCAT) that combines Between-Class learning (BC-learning) with standard AT. Specifically, BCAT mixes two adversarial examples from different classes and uses the mixed between-class adversarial examples to train a model instead of original adversarial examples during AT. We further propose BCAT+ which adopts a more powerful mixing method. BCAT and BCAT+ impose effective regularization on the feature distribution of adversarial examples to enlarge between-class distance, thus improving the robustness generalization and the standard generalization performance of AT. The proposed algorithms do not introduce any hyperparameters into standard AT; therefore, the process of hyperparameters searching can be avoided. We evaluate the proposed algorithms under both white-box attacks and black-box attacks using a spectrum of perturbation values on CIFAR-10, CIFAR-100, and SVHN datasets. The research findings indicate that our algorithms achieve better global robustness generalization performance than the state-of-the-art adversarial defense methods. |
first_indexed | 2024-03-11T05:55:26Z |
format | Article |
id | doaj.art-86558422308a46baa95f5f3f2f56691e |
institution | Directory Open Access Journal |
issn | 1424-8220 |
language | English |
last_indexed | 2024-03-11T05:55:26Z |
publishDate | 2023-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj.art-86558422308a46baa95f5f3f2f56691e2023-11-17T13:48:05ZengMDPI AGSensors1424-82202023-03-01236325210.3390/s23063252Between-Class Adversarial Training for Improving Adversarial Robustness of Image ClassificationDesheng Wang0Weidong Jin1Yunpu Wu2School of Electrical Engineering, Southwest Jiaotong University, Chengdu 611756, ChinaSchool of Electrical Engineering, Southwest Jiaotong University, Chengdu 611756, ChinaSchool of Electrical Engineering and Electronic Information, Xihua University, Chengdu 610039, ChinaDeep neural networks (DNNs) have been known to be vulnerable to adversarial attacks. Adversarial training (AT) is, so far, the only method that can guarantee the robustness of DNNs to adversarial attacks. However, the robustness generalization accuracy gain of AT is still far lower than the standard generalization accuracy of an undefended model, and there is known to be a trade-off between the standard generalization accuracy and the robustness generalization accuracy of an adversarially trained model. In order to improve the robustness generalization and the standard generalization performance trade-off of AT, we propose a novel defense algorithm called Between-Class Adversarial Training (BCAT) that combines Between-Class learning (BC-learning) with standard AT. Specifically, BCAT mixes two adversarial examples from different classes and uses the mixed between-class adversarial examples to train a model instead of original adversarial examples during AT. We further propose BCAT+ which adopts a more powerful mixing method. BCAT and BCAT+ impose effective regularization on the feature distribution of adversarial examples to enlarge between-class distance, thus improving the robustness generalization and the standard generalization performance of AT. The proposed algorithms do not introduce any hyperparameters into standard AT; therefore, the process of hyperparameters searching can be avoided. We evaluate the proposed algorithms under both white-box attacks and black-box attacks using a spectrum of perturbation values on CIFAR-10, CIFAR-100, and SVHN datasets. The research findings indicate that our algorithms achieve better global robustness generalization performance than the state-of-the-art adversarial defense methods.https://www.mdpi.com/1424-8220/23/6/3252adversarial trainingbetween-class learningrobustnessregularization |
spellingShingle | Desheng Wang Weidong Jin Yunpu Wu Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification Sensors adversarial training between-class learning robustness regularization |
title | Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification |
title_full | Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification |
title_fullStr | Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification |
title_full_unstemmed | Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification |
title_short | Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification |
title_sort | between class adversarial training for improving adversarial robustness of image classification |
topic | adversarial training between-class learning robustness regularization |
url | https://www.mdpi.com/1424-8220/23/6/3252 |
work_keys_str_mv | AT deshengwang betweenclassadversarialtrainingforimprovingadversarialrobustnessofimageclassification AT weidongjin betweenclassadversarialtrainingforimprovingadversarialrobustnessofimageclassification AT yunpuwu betweenclassadversarialtrainingforimprovingadversarialrobustnessofimageclassification |