PCA as a defense against some adversaries

Neural network classifiers are known to be highly vulnerable to adversarial perturbations in their inputs. Under the hypothesis that adversarial examples lie outside of the sub-manifold of natural images, previous work has investigated the impact of principal components in data on adversarial robust...

Full description

Bibliographic Details
Main Authors: Aparne, Gupta, Banburski, Andrzej, Poggio, Tomaso
Format: Article
Published: Center for Brains, Minds and Machines (CBMM) 2022
Online Access:https://hdl.handle.net/1721.1/141424
Description
Summary:Neural network classifiers are known to be highly vulnerable to adversarial perturbations in their inputs. Under the hypothesis that adversarial examples lie outside of the sub-manifold of natural images, previous work has investigated the impact of principal components in data on adversarial robustness. In this paper we show that there exists a very simple defense mechanism in the case where adversarial images are separable in a previously defined $(k,p)$ metric. This defense is very successful against the popular Carlini-Wagner attack, but less so against some other common attacks like FGSM. It is interesting to note that the defense is still successful for relatively large perturbations.