A Textual Backdoor Defense Method Based on Deep Feature Classification

Natural language processing (NLP) models based on deep neural networks (DNNs) are vulnerable to backdoor attacks. Existing backdoor defense methods have limited effectiveness and coverage scenarios. We propose a textual backdoor defense method based on deep feature classification. The method include...

Full description

Bibliographic Details
Main Authors: Kun Shao, Junan Yang, Pengjiang Hu, Xiaoshuai Li
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/25/2/220
_version_ 1811153815131389952
author Kun Shao
Junan Yang
Pengjiang Hu
Xiaoshuai Li
author_facet Kun Shao
Junan Yang
Pengjiang Hu
Xiaoshuai Li
author_sort Kun Shao
collection DOAJ
description Natural language processing (NLP) models based on deep neural networks (DNNs) are vulnerable to backdoor attacks. Existing backdoor defense methods have limited effectiveness and coverage scenarios. We propose a textual backdoor defense method based on deep feature classification. The method includes deep feature extraction and classifier construction. The method exploits the distinguishability of deep features of poisoned data and benign data. Backdoor defense is implemented in both offline and online scenarios. We conducted defense experiments on two datasets and two models for a variety of backdoor attacks. The experimental results demonstrate the effectiveness of this defense approach and outperform the baseline defense method.
first_indexed 2024-03-11T08:52:03Z
format Article
id doaj.art-9fb1329d11d34433b6507f578075a6be
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-03-11T08:52:03Z
publishDate 2023-01-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-9fb1329d11d34433b6507f578075a6be2023-11-16T20:22:29ZengMDPI AGEntropy1099-43002023-01-0125222010.3390/e25020220A Textual Backdoor Defense Method Based on Deep Feature ClassificationKun Shao0Junan Yang1Pengjiang Hu2Xiaoshuai Li3College of Electronic Engineering, National University of Defense Technology, Hefei 230037, ChinaCollege of Electronic Engineering, National University of Defense Technology, Hefei 230037, ChinaCollege of Electronic Engineering, National University of Defense Technology, Hefei 230037, ChinaCollege of Electronic Engineering, National University of Defense Technology, Hefei 230037, ChinaNatural language processing (NLP) models based on deep neural networks (DNNs) are vulnerable to backdoor attacks. Existing backdoor defense methods have limited effectiveness and coverage scenarios. We propose a textual backdoor defense method based on deep feature classification. The method includes deep feature extraction and classifier construction. The method exploits the distinguishability of deep features of poisoned data and benign data. Backdoor defense is implemented in both offline and online scenarios. We conducted defense experiments on two datasets and two models for a variety of backdoor attacks. The experimental results demonstrate the effectiveness of this defense approach and outperform the baseline defense method.https://www.mdpi.com/1099-4300/25/2/220deep neural networksnatural language processingadversarial machine learningbackdoor attacksbackdoor defenses
spellingShingle Kun Shao
Junan Yang
Pengjiang Hu
Xiaoshuai Li
A Textual Backdoor Defense Method Based on Deep Feature Classification
Entropy
deep neural networks
natural language processing
adversarial machine learning
backdoor attacks
backdoor defenses
title A Textual Backdoor Defense Method Based on Deep Feature Classification
title_full A Textual Backdoor Defense Method Based on Deep Feature Classification
title_fullStr A Textual Backdoor Defense Method Based on Deep Feature Classification
title_full_unstemmed A Textual Backdoor Defense Method Based on Deep Feature Classification
title_short A Textual Backdoor Defense Method Based on Deep Feature Classification
title_sort textual backdoor defense method based on deep feature classification
topic deep neural networks
natural language processing
adversarial machine learning
backdoor attacks
backdoor defenses
url https://www.mdpi.com/1099-4300/25/2/220
work_keys_str_mv AT kunshao atextualbackdoordefensemethodbasedondeepfeatureclassification
AT junanyang atextualbackdoordefensemethodbasedondeepfeatureclassification
AT pengjianghu atextualbackdoordefensemethodbasedondeepfeatureclassification
AT xiaoshuaili atextualbackdoordefensemethodbasedondeepfeatureclassification
AT kunshao textualbackdoordefensemethodbasedondeepfeatureclassification
AT junanyang textualbackdoordefensemethodbasedondeepfeatureclassification
AT pengjianghu textualbackdoordefensemethodbasedondeepfeatureclassification
AT xiaoshuaili textualbackdoordefensemethodbasedondeepfeatureclassification