Summary: | <p>Probabilistic models are an indispensable tool in many scientific fields, from the social and medical sciences to physics and machine learning. In probabilistic programming, such models are specified as computer programs: a flexible yet precise representation that allows for automated analysis. Such probabilistic programs can express both randomized algorithms and statistical or machine learning models. This thesis focuses on the Bayesian framework for reasoning and learning under uncertainty, where prior beliefs about the world (represented as a probability distribution on the model parameters) are updated based on observed data using Bayes' rule. The computation of this posterior distribution, known as Bayesian inference, presents a major challenge.</p>
<p>Probabilistic programming systems allow practitioners to reuse general-purpose inference algorithms, rather than manually deriving new ones for each model. These automated inference methods fall into two categories: exact and approximate. Exact inference is often intractable or imposes restrictions on supported models, whereas approximate methods are more scalable and widely applicable, but the quality of the approximation can be difficult to assess.</p>
<p>This thesis introduces the concept of <em>guaranteed bounds</em> on the posterior distribution as a middle ground between exact and approximate inference. These bounds can be computed for a much larger class of probabilistic programs than exact methods and, unlike approximate methods, provide non-stochastic guarantees about the posterior distribution, such as: “the posterior probability of a given event is at least <em>p</em> and at most <em>q</em>”. Crucially, these bounds are fully automated and require no user-provided auxiliary information like loop invariants. Thus they can serve as a ground truth to verify and debug approximate inference. As a special case, they can be applied to the automated analysis of randomized algorithms, which is itself a research area of considerable interest.</p>
<p>The main contribution of this thesis is a set of methods for computing guaranteed bounds for probabilistic programs. The first approach uses abstract interpretation with interval arithmetic and is applicable to a universal probabilistic programming language with higher-order functions, continuous distributions, and recursion. The remainder of the thesis deals with imperative probabilistic programs limited to linear operations and discrete observations. The second approach can transform any exact inference algorithm for loop-free programs into a method that computes guaranteed bounds for programs with loops. This is applied to a novel exact inference method for loop-free programs, which leverages probability generating functions and is of independent interest. The third approach provides more informative bounds for probabilistic programs with only discrete distributions, including bounds on moments and tail asymptotics.</p>
<p>All these methods enjoy desirable theoretical properties, such as soundness and, in many cases, convergence to the exact result as the computational budget increases. Experimental evaluations demonstrate that they are applicable to more probabilistic programs than exact inference and can detect problems with approximate inference. Overall, these contributions mark a significant step towards formal verification of Bayesian inference results in probabilistic programming.</p>
|