A Textual Backdoor Defense Method Based on Deep Feature Classification
Natural language processing (NLP) models based on deep neural networks (DNNs) are vulnerable to backdoor attacks. Existing backdoor defense methods have limited effectiveness and coverage scenarios. We propose a textual backdoor defense method based on deep feature classification. The method include...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-01-01
|
Series: | Entropy |
Subjects: | |
Online Access: | https://www.mdpi.com/1099-4300/25/2/220 |
_version_ | 1811153815131389952 |
---|---|
author | Kun Shao Junan Yang Pengjiang Hu Xiaoshuai Li |
author_facet | Kun Shao Junan Yang Pengjiang Hu Xiaoshuai Li |
author_sort | Kun Shao |
collection | DOAJ |
description | Natural language processing (NLP) models based on deep neural networks (DNNs) are vulnerable to backdoor attacks. Existing backdoor defense methods have limited effectiveness and coverage scenarios. We propose a textual backdoor defense method based on deep feature classification. The method includes deep feature extraction and classifier construction. The method exploits the distinguishability of deep features of poisoned data and benign data. Backdoor defense is implemented in both offline and online scenarios. We conducted defense experiments on two datasets and two models for a variety of backdoor attacks. The experimental results demonstrate the effectiveness of this defense approach and outperform the baseline defense method. |
first_indexed | 2024-03-11T08:52:03Z |
format | Article |
id | doaj.art-9fb1329d11d34433b6507f578075a6be |
institution | Directory Open Access Journal |
issn | 1099-4300 |
language | English |
last_indexed | 2024-03-11T08:52:03Z |
publishDate | 2023-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Entropy |
spelling | doaj.art-9fb1329d11d34433b6507f578075a6be2023-11-16T20:22:29ZengMDPI AGEntropy1099-43002023-01-0125222010.3390/e25020220A Textual Backdoor Defense Method Based on Deep Feature ClassificationKun Shao0Junan Yang1Pengjiang Hu2Xiaoshuai Li3College of Electronic Engineering, National University of Defense Technology, Hefei 230037, ChinaCollege of Electronic Engineering, National University of Defense Technology, Hefei 230037, ChinaCollege of Electronic Engineering, National University of Defense Technology, Hefei 230037, ChinaCollege of Electronic Engineering, National University of Defense Technology, Hefei 230037, ChinaNatural language processing (NLP) models based on deep neural networks (DNNs) are vulnerable to backdoor attacks. Existing backdoor defense methods have limited effectiveness and coverage scenarios. We propose a textual backdoor defense method based on deep feature classification. The method includes deep feature extraction and classifier construction. The method exploits the distinguishability of deep features of poisoned data and benign data. Backdoor defense is implemented in both offline and online scenarios. We conducted defense experiments on two datasets and two models for a variety of backdoor attacks. The experimental results demonstrate the effectiveness of this defense approach and outperform the baseline defense method.https://www.mdpi.com/1099-4300/25/2/220deep neural networksnatural language processingadversarial machine learningbackdoor attacksbackdoor defenses |
spellingShingle | Kun Shao Junan Yang Pengjiang Hu Xiaoshuai Li A Textual Backdoor Defense Method Based on Deep Feature Classification Entropy deep neural networks natural language processing adversarial machine learning backdoor attacks backdoor defenses |
title | A Textual Backdoor Defense Method Based on Deep Feature Classification |
title_full | A Textual Backdoor Defense Method Based on Deep Feature Classification |
title_fullStr | A Textual Backdoor Defense Method Based on Deep Feature Classification |
title_full_unstemmed | A Textual Backdoor Defense Method Based on Deep Feature Classification |
title_short | A Textual Backdoor Defense Method Based on Deep Feature Classification |
title_sort | textual backdoor defense method based on deep feature classification |
topic | deep neural networks natural language processing adversarial machine learning backdoor attacks backdoor defenses |
url | https://www.mdpi.com/1099-4300/25/2/220 |
work_keys_str_mv | AT kunshao atextualbackdoordefensemethodbasedondeepfeatureclassification AT junanyang atextualbackdoordefensemethodbasedondeepfeatureclassification AT pengjianghu atextualbackdoordefensemethodbasedondeepfeatureclassification AT xiaoshuaili atextualbackdoordefensemethodbasedondeepfeatureclassification AT kunshao textualbackdoordefensemethodbasedondeepfeatureclassification AT junanyang textualbackdoordefensemethodbasedondeepfeatureclassification AT pengjianghu textualbackdoordefensemethodbasedondeepfeatureclassification AT xiaoshuaili textualbackdoordefensemethodbasedondeepfeatureclassification |