A Textual Backdoor Defense Method Based on Deep Feature Classification

Natural language processing (NLP) models based on deep neural networks (DNNs) are vulnerable to backdoor attacks. Existing backdoor defense methods have limited effectiveness and coverage scenarios. We propose a textual backdoor defense method based on deep feature classification. The method include...

Full description

Bibliographic Details
Main Authors:	Kun Shao, Junan Yang, Pengjiang Hu, Xiaoshuai Li
Format:	Article
Language:	English
Published:	MDPI AG 2023-01-01
Series:	Entropy
Subjects:	deep neural networks natural language processing adversarial machine learning backdoor attacks backdoor defenses
Online Access:	https://www.mdpi.com/1099-4300/25/2/220

_version_	1811153815131389952
author	Kun Shao Junan Yang Pengjiang Hu Xiaoshuai Li
author_facet	Kun Shao Junan Yang Pengjiang Hu Xiaoshuai Li
author_sort	Kun Shao
collection	DOAJ
description	Natural language processing (NLP) models based on deep neural networks (DNNs) are vulnerable to backdoor attacks. Existing backdoor defense methods have limited effectiveness and coverage scenarios. We propose a textual backdoor defense method based on deep feature classification. The method includes deep feature extraction and classifier construction. The method exploits the distinguishability of deep features of poisoned data and benign data. Backdoor defense is implemented in both offline and online scenarios. We conducted defense experiments on two datasets and two models for a variety of backdoor attacks. The experimental results demonstrate the effectiveness of this defense approach and outperform the baseline defense method.
first_indexed	2024-03-11T08:52:03Z
format	Article
id	doaj.art-9fb1329d11d34433b6507f578075a6be
institution	Directory Open Access Journal
issn	1099-4300
language	English
last_indexed	2024-03-11T08:52:03Z
publishDate	2023-01-01
publisher	MDPI AG
record_format	Article
series	Entropy
spelling	doaj.art-9fb1329d11d34433b6507f578075a6be2023-11-16T20:22:29ZengMDPI AGEntropy1099-43002023-01-0125222010.3390/e25020220A Textual Backdoor Defense Method Based on Deep Feature ClassificationKun Shao0Junan Yang1Pengjiang Hu2Xiaoshuai Li3College of Electronic Engineering, National University of Defense Technology, Hefei 230037, ChinaCollege of Electronic Engineering, National University of Defense Technology, Hefei 230037, ChinaCollege of Electronic Engineering, National University of Defense Technology, Hefei 230037, ChinaCollege of Electronic Engineering, National University of Defense Technology, Hefei 230037, ChinaNatural language processing (NLP) models based on deep neural networks (DNNs) are vulnerable to backdoor attacks. Existing backdoor defense methods have limited effectiveness and coverage scenarios. We propose a textual backdoor defense method based on deep feature classification. The method includes deep feature extraction and classifier construction. The method exploits the distinguishability of deep features of poisoned data and benign data. Backdoor defense is implemented in both offline and online scenarios. We conducted defense experiments on two datasets and two models for a variety of backdoor attacks. The experimental results demonstrate the effectiveness of this defense approach and outperform the baseline defense method.https://www.mdpi.com/1099-4300/25/2/220deep neural networksnatural language processingadversarial machine learningbackdoor attacksbackdoor defenses
spellingShingle	Kun Shao Junan Yang Pengjiang Hu Xiaoshuai Li A Textual Backdoor Defense Method Based on Deep Feature Classification Entropy deep neural networks natural language processing adversarial machine learning backdoor attacks backdoor defenses
title	A Textual Backdoor Defense Method Based on Deep Feature Classification
title_full	A Textual Backdoor Defense Method Based on Deep Feature Classification
title_fullStr	A Textual Backdoor Defense Method Based on Deep Feature Classification
title_full_unstemmed	A Textual Backdoor Defense Method Based on Deep Feature Classification
title_short	A Textual Backdoor Defense Method Based on Deep Feature Classification
title_sort	textual backdoor defense method based on deep feature classification
topic	deep neural networks natural language processing adversarial machine learning backdoor attacks backdoor defenses
url	https://www.mdpi.com/1099-4300/25/2/220
work_keys_str_mv	AT kunshao atextualbackdoordefensemethodbasedondeepfeatureclassification AT junanyang atextualbackdoordefensemethodbasedondeepfeatureclassification AT pengjianghu atextualbackdoordefensemethodbasedondeepfeatureclassification AT xiaoshuaili atextualbackdoordefensemethodbasedondeepfeatureclassification AT kunshao textualbackdoordefensemethodbasedondeepfeatureclassification AT junanyang textualbackdoordefensemethodbasedondeepfeatureclassification AT pengjianghu textualbackdoordefensemethodbasedondeepfeatureclassification AT xiaoshuaili textualbackdoordefensemethodbasedondeepfeatureclassification

A Textual Backdoor Defense Method Based on Deep Feature Classification

Similar Items