Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks

The pre-training-then-fine-tuning paradigm has been widely used in deep learning. Due to the huge computation cost for pre-training, practitioners usually download pre-trained models from the Internet and fine-tune them on downstream datasets, while the downloaded models may suffer backdoor attacks....

Full description

Bibliographic Details
Main Authors:	Zhang, Zhengyan, Xiao, Guangxuan, Li, Yongwei, Lv, Tian, Qi, Fanchao, Liu, Zhiyuan, Wang, Yasheng, Jiang, Xin, Sun, Maosong
Format:	Article
Language:	English
Published:	Springer Science and Business Media LLC 2024
Online Access:	https://hdl.handle.net/1721.1/155692

_version_	1824458242998140928
author	Zhang, Zhengyan Xiao, Guangxuan Li, Yongwei Lv, Tian Qi, Fanchao Liu, Zhiyuan Wang, Yasheng Jiang, Xin Sun, Maosong
author_facet	Zhang, Zhengyan Xiao, Guangxuan Li, Yongwei Lv, Tian Qi, Fanchao Liu, Zhiyuan Wang, Yasheng Jiang, Xin Sun, Maosong
author_sort	Zhang, Zhengyan
collection	MIT
description	The pre-training-then-fine-tuning paradigm has been widely used in deep learning. Due to the huge computation cost for pre-training, practitioners usually download pre-trained models from the Internet and fine-tune them on downstream datasets, while the downloaded models may suffer backdoor attacks. Different from previous attacks aiming at a target task, we show that a backdoored pre-trained model can behave maliciously in various downstream tasks without foreknowing task information. Attackers can restrict the output representations (the values of output neurons) of trigger-embedded samples to arbitrary predefined values through additional training, namely neuron-level backdoor attack (NeuBA). Since fine-tuning has little effect on model parameters, the fine-tuned model will retain the backdoor functionality and predict a specific label for the samples embedded with the same trigger. To provoke multiple labels in a specific task, attackers can introduce several triggers with predefined contrastive values. In the experiments of both natural language processing (NLP) and computer vision (CV), we show that NeuBA can well control the predictions for trigger-embedded instances with different trigger designs. Our findings sound a red alarm for the wide use of pre-trained models. Finally, we apply several defense methods to NeuBA and find that model pruning is a promising technique to resist NeuBA by omitting backdoored neurons.
first_indexed	2024-09-23T13:44:40Z
format	Article
id	mit-1721.1/155692
institution	Massachusetts Institute of Technology
language	English
last_indexed	2025-02-19T04:22:47Z
publishDate	2024
publisher	Springer Science and Business Media LLC
record_format	dspace
spelling	mit-1721.1/1556922025-01-03T05:03:23Z Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks Zhang, Zhengyan Xiao, Guangxuan Li, Yongwei Lv, Tian Qi, Fanchao Liu, Zhiyuan Wang, Yasheng Jiang, Xin Sun, Maosong The pre-training-then-fine-tuning paradigm has been widely used in deep learning. Due to the huge computation cost for pre-training, practitioners usually download pre-trained models from the Internet and fine-tune them on downstream datasets, while the downloaded models may suffer backdoor attacks. Different from previous attacks aiming at a target task, we show that a backdoored pre-trained model can behave maliciously in various downstream tasks without foreknowing task information. Attackers can restrict the output representations (the values of output neurons) of trigger-embedded samples to arbitrary predefined values through additional training, namely neuron-level backdoor attack (NeuBA). Since fine-tuning has little effect on model parameters, the fine-tuned model will retain the backdoor functionality and predict a specific label for the samples embedded with the same trigger. To provoke multiple labels in a specific task, attackers can introduce several triggers with predefined contrastive values. In the experiments of both natural language processing (NLP) and computer vision (CV), we show that NeuBA can well control the predictions for trigger-embedded instances with different trigger designs. Our findings sound a red alarm for the wide use of pre-trained models. Finally, we apply several defense methods to NeuBA and find that model pruning is a promising technique to resist NeuBA by omitting backdoored neurons. 2024-07-16T15:26:52Z 2024-07-16T15:26:52Z 2023-03-02 2024-07-14T03:17:06Z Article http://purl.org/eprint/type/JournalArticle 2731-538X 2731-5398 https://hdl.handle.net/1721.1/155692 Zhang, Z., Xiao, G., Li, Y. et al. Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks. Mach. Intell. Res. 20, 180–193 (2023). PUBLISHER_CC en 10.1007/s11633-022-1377-5 Machine Intelligence Research Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/ The Author(s), corrected publication application/pdf Springer Science and Business Media LLC Springer Berlin Heidelberg
spellingShingle	Zhang, Zhengyan Xiao, Guangxuan Li, Yongwei Lv, Tian Qi, Fanchao Liu, Zhiyuan Wang, Yasheng Jiang, Xin Sun, Maosong Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks
title	Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks
title_full	Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks
title_fullStr	Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks
title_full_unstemmed	Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks
title_short	Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks
title_sort	red alarm for pre trained models universal vulnerability to neuron level backdoor attacks
url	https://hdl.handle.net/1721.1/155692
work_keys_str_mv	AT zhangzhengyan redalarmforpretrainedmodelsuniversalvulnerabilitytoneuronlevelbackdoorattacks AT xiaoguangxuan redalarmforpretrainedmodelsuniversalvulnerabilitytoneuronlevelbackdoorattacks AT liyongwei redalarmforpretrainedmodelsuniversalvulnerabilitytoneuronlevelbackdoorattacks AT lvtian redalarmforpretrainedmodelsuniversalvulnerabilitytoneuronlevelbackdoorattacks AT qifanchao redalarmforpretrainedmodelsuniversalvulnerabilitytoneuronlevelbackdoorattacks AT liuzhiyuan redalarmforpretrainedmodelsuniversalvulnerabilitytoneuronlevelbackdoorattacks AT wangyasheng redalarmforpretrainedmodelsuniversalvulnerabilitytoneuronlevelbackdoorattacks AT jiangxin redalarmforpretrainedmodelsuniversalvulnerabilitytoneuronlevelbackdoorattacks AT sunmaosong redalarmforpretrainedmodelsuniversalvulnerabilitytoneuronlevelbackdoorattacks

Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks

Similar Items