Towards Deployable Robust Text Classifiers

Text classification has been studied for decades as a fundamental task in natural language processing. Deploying classifiers enables more efficient information processing, which is useful for various applications, including decision-making. However, classifiers also present challenging and long-stan...

Full description

Bibliographic Details
Main Author:	Xu, Lei
Other Authors:	Veeramachaneni, Kalyan
Format:	Thesis
Published:	Massachusetts Institute of Technology 2023
Online Access:	https://hdl.handle.net/1721.1/150071

_version_	1826206303944441856
author	Xu, Lei
author2	Veeramachaneni, Kalyan
author_facet	Veeramachaneni, Kalyan Xu, Lei
author_sort	Xu, Lei
collection	MIT
description	Text classification has been studied for decades as a fundamental task in natural language processing. Deploying classifiers enables more efficient information processing, which is useful for various applications, including decision-making. However, classifiers also present challenging and long-standing problems. As their use increases, expectations about their level of robustness, fairness, accuracy, and other metrics increase in turn. In this dissertation, we aim to develop more deployable and robust text classifiers, with a focus on improving classifier robustness against adversarial attacks by developing both attack and defense approaches. Adversarial attacks are a security concern for text classifiers, as they involve cases where a malicious user takes a sentence and perturbs it slightly to manipulate the classifier’s output. To design more effective attack methods, we focus first on improving adversarial sentence quality – unlike existing methods that prioritize misclassification and ignore sentence similarity and fluency, we synthesize these three criteria into a combined critique score. We then outline a rewrite and rollback framework for optimizing this score and achieving state-of-theart attack success rates while improving similarity and fluency. We focus second on computational requirements. Existing methods typically use combinatorial search to find adversarial examples that alter multiple words, which are inefficient and require many queries to the classifier. We overcome this problem by proposing a single-word adversarial perturbation attack. This attack only needs to replace a single word in the original sentence with a high-adversarial-capacity word, significantly improving efficiency while the attack success rate remains similar to that of existing methods. We then turn to defense. Currently, the most common approach for defending against attacks is training classifiers using adversarial examples as data augmentation, a method limited by the inefficiency of many attack methods. We show that training classifiers with data augmentation through our efficient single-word perturbation attack can improve the robustness of the classifier against other attack methods. We also design in situ data augmentation to counteract adversarial perturbations in the classifier input. We use the gradient norm to identify keywords for classification and a pre-trained language model to replace them. Our in situ augmentation can effectively improve robustness and does not require tuning the classifier. Finally, we explore the vulnerability of a very recent text classification architecture – prompt-based classifiers — and find them to be vulnerable to attacks as well. We also develop a library called Fibber to facilitate adversarial robustness research.
first_indexed	2024-09-23T13:27:20Z
format	Thesis
id	mit-1721.1/150071
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T13:27:20Z
publishDate	2023
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1500712023-04-01T03:28:05Z Towards Deployable Robust Text Classifiers Xu, Lei Veeramachaneni, Kalyan Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Text classification has been studied for decades as a fundamental task in natural language processing. Deploying classifiers enables more efficient information processing, which is useful for various applications, including decision-making. However, classifiers also present challenging and long-standing problems. As their use increases, expectations about their level of robustness, fairness, accuracy, and other metrics increase in turn. In this dissertation, we aim to develop more deployable and robust text classifiers, with a focus on improving classifier robustness against adversarial attacks by developing both attack and defense approaches. Adversarial attacks are a security concern for text classifiers, as they involve cases where a malicious user takes a sentence and perturbs it slightly to manipulate the classifier’s output. To design more effective attack methods, we focus first on improving adversarial sentence quality – unlike existing methods that prioritize misclassification and ignore sentence similarity and fluency, we synthesize these three criteria into a combined critique score. We then outline a rewrite and rollback framework for optimizing this score and achieving state-of-theart attack success rates while improving similarity and fluency. We focus second on computational requirements. Existing methods typically use combinatorial search to find adversarial examples that alter multiple words, which are inefficient and require many queries to the classifier. We overcome this problem by proposing a single-word adversarial perturbation attack. This attack only needs to replace a single word in the original sentence with a high-adversarial-capacity word, significantly improving efficiency while the attack success rate remains similar to that of existing methods. We then turn to defense. Currently, the most common approach for defending against attacks is training classifiers using adversarial examples as data augmentation, a method limited by the inefficiency of many attack methods. We show that training classifiers with data augmentation through our efficient single-word perturbation attack can improve the robustness of the classifier against other attack methods. We also design in situ data augmentation to counteract adversarial perturbations in the classifier input. We use the gradient norm to identify keywords for classification and a pre-trained language model to replace them. Our in situ augmentation can effectively improve robustness and does not require tuning the classifier. Finally, we explore the vulnerability of a very recent text classification architecture – prompt-based classifiers — and find them to be vulnerable to attacks as well. We also develop a library called Fibber to facilitate adversarial robustness research. Ph.D. 2023-03-31T14:29:52Z 2023-03-31T14:29:52Z 2023-02 2023-02-28T14:39:49.867Z Thesis https://hdl.handle.net/1721.1/150071 0000-0002-3846-4228 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Xu, Lei Towards Deployable Robust Text Classifiers
title	Towards Deployable Robust Text Classifiers
title_full	Towards Deployable Robust Text Classifiers
title_fullStr	Towards Deployable Robust Text Classifiers
title_full_unstemmed	Towards Deployable Robust Text Classifiers
title_short	Towards Deployable Robust Text Classifiers
title_sort	towards deployable robust text classifiers
url	https://hdl.handle.net/1721.1/150071
work_keys_str_mv	AT xulei towardsdeployablerobusttextclassifiers

Towards Deployable Robust Text Classifiers

Similar Items