Assessing robustness of text classification through maximal safe radius computation

Neural network NLP models are vulnerable to small modifications of the input that maintain the original meaning but result in a different prediction. In this paper, we focus on robustness of text classification against word substitutions, aiming to provide guarantees that the model prediction does n...

पूर्ण विवरण

ग्रंथसूची विवरण
मुख्य लेखकों:	La Malfa, E, Wu, M, Laurenti, L, Wang, B, Hartshorn, A, Kwiatkowska, M
स्वरूप:	Conference item
भाषा:	English
प्रकाशित:	Association for Computational Linguistics 2020

_version_	1826271934131732480
author	La Malfa, E Wu, M Laurenti, L Wang, B Hartshorn, A Kwiatkowska, M
author_facet	La Malfa, E Wu, M Laurenti, L Wang, B Hartshorn, A Kwiatkowska, M
author_sort	La Malfa, E
collection	OXFORD
description	Neural network NLP models are vulnerable to small modifications of the input that maintain the original meaning but result in a different prediction. In this paper, we focus on robustness of text classification against word substitutions, aiming to provide guarantees that the model prediction does not change if a word is replaced with a plausible alternative, such as a synonym. As a measure of robustness, we adopt the notion of the maximal safe radius for a given input text, which is the minimum distance in the embedding space to the decision boundary. Since computing the exact maximal safe radius is not feasible in practice, we instead approximate it by computing a lower and upper bound. For the upper bound computation, we employ Monte Carlo Tree Search in conjunction with syntactic filtering to analyse the effect of single and multiple word substitutions. The lower bound computation is achieved through an adaptation of the linear bounding techniques implemented in tools CNN-Cert and POPQORN, respectively for convolutional and recurrent network models. We evaluate the methods on sentiment analysis and news classification models for four datasets (IMDB, SST, AG News and NEWS) and a range of embeddings, and provide an analysis of robustness trends. We also apply our framework to interpretability analysis and compare it with LIME.
first_indexed	2024-03-06T22:04:34Z
format	Conference item
id	oxford-uuid:4fb5398d-cba8-47e8-9ff8-47bea786fd55
institution	University of Oxford
language	English
last_indexed	2024-03-06T22:04:34Z
publishDate	2020
publisher	Association for Computational Linguistics
record_format	dspace
spelling	oxford-uuid:4fb5398d-cba8-47e8-9ff8-47bea786fd552022-03-26T16:09:04ZAssessing robustness of text classification through maximal safe radius computationConference itemhttp://purl.org/coar/resource_type/c_5794uuid:4fb5398d-cba8-47e8-9ff8-47bea786fd55EnglishSymplectic ElementsAssociation for Computational Linguistics2020La Malfa, EWu, MLaurenti, LWang, BHartshorn, AKwiatkowska, MNeural network NLP models are vulnerable to small modifications of the input that maintain the original meaning but result in a different prediction. In this paper, we focus on robustness of text classification against word substitutions, aiming to provide guarantees that the model prediction does not change if a word is replaced with a plausible alternative, such as a synonym. As a measure of robustness, we adopt the notion of the maximal safe radius for a given input text, which is the minimum distance in the embedding space to the decision boundary. Since computing the exact maximal safe radius is not feasible in practice, we instead approximate it by computing a lower and upper bound. For the upper bound computation, we employ Monte Carlo Tree Search in conjunction with syntactic filtering to analyse the effect of single and multiple word substitutions. The lower bound computation is achieved through an adaptation of the linear bounding techniques implemented in tools CNN-Cert and POPQORN, respectively for convolutional and recurrent network models. We evaluate the methods on sentiment analysis and news classification models for four datasets (IMDB, SST, AG News and NEWS) and a range of embeddings, and provide an analysis of robustness trends. We also apply our framework to interpretability analysis and compare it with LIME.
spellingShingle	La Malfa, E Wu, M Laurenti, L Wang, B Hartshorn, A Kwiatkowska, M Assessing robustness of text classification through maximal safe radius computation
title	Assessing robustness of text classification through maximal safe radius computation
title_full	Assessing robustness of text classification through maximal safe radius computation
title_fullStr	Assessing robustness of text classification through maximal safe radius computation
title_full_unstemmed	Assessing robustness of text classification through maximal safe radius computation
title_short	Assessing robustness of text classification through maximal safe radius computation
title_sort	assessing robustness of text classification through maximal safe radius computation
work_keys_str_mv	AT lamalfae assessingrobustnessoftextclassificationthroughmaximalsaferadiuscomputation AT wum assessingrobustnessoftextclassificationthroughmaximalsaferadiuscomputation AT laurentil assessingrobustnessoftextclassificationthroughmaximalsaferadiuscomputation AT wangb assessingrobustnessoftextclassificationthroughmaximalsaferadiuscomputation AT hartshorna assessingrobustnessoftextclassificationthroughmaximalsaferadiuscomputation AT kwiatkowskam assessingrobustnessoftextclassificationthroughmaximalsaferadiuscomputation

Assessing robustness of text classification through maximal safe radius computation

समान संसाधन