Dual-Targeted Textfooler Attack on Text Classification Systems

Deep neural networks provide good performance on classification tasks such as those for image, audio, and text classification. However, such neural networks are vulnerable to adversarial examples. An adversarial example is a sample created by adding a small adversarial noise to an original data samp...

Full description

Bibliographic Details
Main Author:	Hyun Kwon
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Machine learning evasion attack deep neural network (DNN) text classification text adversarial example
Online Access:	https://ieeexplore.ieee.org/document/9580824/

_version_	1797902210699362304
author	Hyun Kwon
author_facet	Hyun Kwon
author_sort	Hyun Kwon
collection	DOAJ
description	Deep neural networks provide good performance on classification tasks such as those for image, audio, and text classification. However, such neural networks are vulnerable to adversarial examples. An adversarial example is a sample created by adding a small adversarial noise to an original data sample in such a way that it will be correctly classified by a human but misclassified by a deep neural network. Studies on adversarial examples have focused mainly on the image field, but research is expanding into the text field as well. Adversarial examples in the text field that are designed with two targets in mind can be useful in certain situations. In a military scenario, for example, if enemy models A and B use a text recognition model, it may be desirable to cause enemy model A tanks to go to the right and enemy model B self-propelled guns to go to the left by using strategically designed adversarial messages. Such a dual-targeted adversarial example could accomplish this by causing different misclassifications in different models, in contrast to single-target adversarial examples produced by existing methods. In this paper, I propose a method for creating a dual-targeted textual adversarial example for attacking a text classification system. Unlike the existing adversarial methods, which are designed for images, the proposed method creates dual-targeted adversarial examples that will be misclassified as a different class by each of two models while maintaining the meaning and grammar of the original sentence, by substituting words of importance. Experiments were conducted using the SNLI dataset and the TensorFlow library. The results demonstrate that the proposed method can generate a dual-targeted adversarial example with an average attack success rate of 82.2% on the two models.
first_indexed	2024-04-10T09:14:07Z
format	Article
id	doaj.art-3f8597211686401f89a820a785558aa9
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-10T09:14:07Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-3f8597211686401f89a820a785558aa92023-02-21T00:01:32ZengIEEEIEEE Access2169-35362023-01-0111151641517310.1109/ACCESS.2021.31213669580824Dual-Targeted Textfooler Attack on Text Classification SystemsHyun Kwon0https://orcid.org/0000-0003-1169-9892Department of Artificial Intelligence and Data Science, Korea Military Academy, Seoul, South KoreaDeep neural networks provide good performance on classification tasks such as those for image, audio, and text classification. However, such neural networks are vulnerable to adversarial examples. An adversarial example is a sample created by adding a small adversarial noise to an original data sample in such a way that it will be correctly classified by a human but misclassified by a deep neural network. Studies on adversarial examples have focused mainly on the image field, but research is expanding into the text field as well. Adversarial examples in the text field that are designed with two targets in mind can be useful in certain situations. In a military scenario, for example, if enemy models A and B use a text recognition model, it may be desirable to cause enemy model A tanks to go to the right and enemy model B self-propelled guns to go to the left by using strategically designed adversarial messages. Such a dual-targeted adversarial example could accomplish this by causing different misclassifications in different models, in contrast to single-target adversarial examples produced by existing methods. In this paper, I propose a method for creating a dual-targeted textual adversarial example for attacking a text classification system. Unlike the existing adversarial methods, which are designed for images, the proposed method creates dual-targeted adversarial examples that will be misclassified as a different class by each of two models while maintaining the meaning and grammar of the original sentence, by substituting words of importance. Experiments were conducted using the SNLI dataset and the TensorFlow library. The results demonstrate that the proposed method can generate a dual-targeted adversarial example with an average attack success rate of 82.2% on the two models.https://ieeexplore.ieee.org/document/9580824/Machine learningevasion attackdeep neural network (DNN)text classificationtext adversarial example
spellingShingle	Hyun Kwon Dual-Targeted Textfooler Attack on Text Classification Systems IEEE Access Machine learning evasion attack deep neural network (DNN) text classification text adversarial example
title	Dual-Targeted Textfooler Attack on Text Classification Systems
title_full	Dual-Targeted Textfooler Attack on Text Classification Systems
title_fullStr	Dual-Targeted Textfooler Attack on Text Classification Systems
title_full_unstemmed	Dual-Targeted Textfooler Attack on Text Classification Systems
title_short	Dual-Targeted Textfooler Attack on Text Classification Systems
title_sort	dual targeted textfooler attack on text classification systems
topic	Machine learning evasion attack deep neural network (DNN) text classification text adversarial example
url	https://ieeexplore.ieee.org/document/9580824/
work_keys_str_mv	AT hyunkwon dualtargetedtextfoolerattackontextclassificationsystems

Dual-Targeted Textfooler Attack on Text Classification Systems

Similar Items