Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model

© Springer Nature Switzerland AG 2020. Recently, generating adversarial examples has become an important means of measuring robustness of a deep learning model. Adversarial examples help us identify the susceptibilities of the model and further counter those vulnerabilities by applying adversarial t...

Full description

Bibliographic Details
Main Authors:	Vijayaraghavan, P, Roy, D
Other Authors:	Massachusetts Institute of Technology. Media Laboratory
Format:	Article
Language:	English
Published:	Springer International Publishing 2021
Online Access:	https://hdl.handle.net/1721.1/137065

_version_	1826201292375064576
author	Vijayaraghavan, P Roy, D
author2	Massachusetts Institute of Technology. Media Laboratory
author_facet	Massachusetts Institute of Technology. Media Laboratory Vijayaraghavan, P Roy, D
author_sort	Vijayaraghavan, P
collection	MIT
description	© Springer Nature Switzerland AG 2020. Recently, generating adversarial examples has become an important means of measuring robustness of a deep learning model. Adversarial examples help us identify the susceptibilities of the model and further counter those vulnerabilities by applying adversarial training techniques. In natural language domain, small perturbations in the form of misspellings or paraphrases can drastically change the semantics of the text. We propose a reinforcement learning based approach towards generating adversarial examples in black-box settings. We demonstrate that our method is able to fool well-trained models for (a) IMDB sentiment classification task and (b) AG’s news corpus news categorization task with significantly high success rates. We find that the adversarial examples generated are semantics-preserving perturbations to the original text.
first_indexed	2024-09-23T11:49:40Z
format	Article
id	mit-1721.1/137065
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T11:49:40Z
publishDate	2021
publisher	Springer International Publishing
record_format	dspace
spelling	mit-1721.1/1370652023-02-13T21:23:02Z Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model Vijayaraghavan, P Roy, D Massachusetts Institute of Technology. Media Laboratory © Springer Nature Switzerland AG 2020. Recently, generating adversarial examples has become an important means of measuring robustness of a deep learning model. Adversarial examples help us identify the susceptibilities of the model and further counter those vulnerabilities by applying adversarial training techniques. In natural language domain, small perturbations in the form of misspellings or paraphrases can drastically change the semantics of the text. We propose a reinforcement learning based approach towards generating adversarial examples in black-box settings. We demonstrate that our method is able to fool well-trained models for (a) IMDB sentiment classification task and (b) AG’s news corpus news categorization task with significantly high success rates. We find that the adversarial examples generated are semantics-preserving perturbations to the original text. 2021-11-02T12:24:29Z 2021-11-02T12:24:29Z 2020 2021-07-01T16:55:21Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/137065 Vijayaraghavan, P and Roy, D. 2020. "Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model." Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11907 LNAI. en 10.1007/978-3-030-46147-8_43 Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Springer International Publishing arXiv
spellingShingle	Vijayaraghavan, P Roy, D Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model
title	Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model
title_full	Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model
title_fullStr	Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model
title_full_unstemmed	Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model
title_short	Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model
title_sort	generating black box adversarial examples for text classifiers using a deep reinforced model
url	https://hdl.handle.net/1721.1/137065
work_keys_str_mv	AT vijayaraghavanp generatingblackboxadversarialexamplesfortextclassifiersusingadeepreinforcedmodel AT royd generatingblackboxadversarialexamplesfortextclassifiersusingadeepreinforcedmodel

Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model

Similar Items