Probabilistic safety for bayesian neural networks

We study probabilistic safety for Bayesian Neural Networks (BNNs) under adversarial input perturbations. Given a compact set of input points, T ⊆ R m, we study the probability w.r.t. the BNN posterior that all the points in T are mapped to the same region S in the output space. In particular, this c...

Full description

Bibliographic Details
Main Authors: Wicker, M, Laurenti, L, Patane, A, Kwiatkowska, M
Format: Conference item
Language:English
Published: Journal of Machine Learning Research 2020
Description
Summary:We study probabilistic safety for Bayesian Neural Networks (BNNs) under adversarial input perturbations. Given a compact set of input points, T ⊆ R m, we study the probability w.r.t. the BNN posterior that all the points in T are mapped to the same region S in the output space. In particular, this can be used to evaluate the probability that a network sampled from the BNN is vulnerable to adversarial attacks. We rely on relaxation techniques from non-convex optimization to develop a method for computing a lower bound on probabilistic safety for BNNs, deriving explicit procedures for the case of interval and linear function propagation techniques. We apply our methods to BNNs trained on a regression task, airborne collision avoidance, and MNIST, empirically showing that our approach allows one to certify probabilistic safety of BNNs with millions of parameters.