Summary: | We study probabilistic safety for Bayesian
Neural Networks (BNNs) under adversarial input perturbations. Given a compact set of input points, T ⊆ R
m, we study the probability w.r.t. the BNN posterior that all the points
in T are mapped to the same region S in the
output space. In particular, this can be used
to evaluate the probability that a network sampled from the BNN is vulnerable to adversarial
attacks. We rely on relaxation techniques from
non-convex optimization to develop a method
for computing a lower bound on probabilistic safety for BNNs, deriving explicit procedures for the case of interval and linear function propagation techniques. We apply our
methods to BNNs trained on a regression task,
airborne collision avoidance, and MNIST, empirically showing that our approach allows one
to certify probabilistic safety of BNNs with
millions of parameters.
|