Summary: | We study the problem of certifying the robustness of Bayesian
neural networks (BNNs) to adversarial input perturbations. Specifically,
we define two notions of robustness for BNNs in an adversarial setting:
probabilistic robustness and decision robustness. The former deals with
the probabilistic behaviour of the network, that is, it ensures robustness
across different stochastic realisations of the network, while the latter
provides guarantees for the overall (output) decision of the BNN. Although these robustness properties cannot be computed analytically, we
present a unified computational framework for efficiently and formally
bounding them. Our approach is based on weight interval sampling, integration and bound propagation techniques, and can be applied to BNNs
with a large number of parameters independently of the (approximate)
inference method employed to train the BNN. We evaluate the effectiveness of our method on tasks including airborne collision avoidance, medical imaging and autonomous driving, demonstrating that it can compute
non-trivial guarantees on medium size images (i.e., over 16 thousand input parameters)
|