Fast and correct variational inference for probabilistic programming: Differentiability, reparameterisation and smoothing
<p>Probabilistic programming is an innovative programming paradigm for posing and automatically solving Bayesian inference problems. In this thesis, we study the foundations of fast yet correct inference for probabilistic programming.</p> <p>Many of the most successful inference t...
Autor principal: | |
---|---|
Altres autors: | |
Format: | Thesis |
Idioma: | English |
Publicat: |
2023
|
Matèries: |
_version_ | 1826312770847506432 |
---|---|
author | Wagner, D |
author2 | Ong, L |
author_facet | Ong, L Wagner, D |
author_sort | Wagner, D |
collection | OXFORD |
description | <p>Probabilistic programming is an innovative programming paradigm for posing and automatically solving Bayesian inference problems. In this thesis, we study the foundations of fast yet correct inference for probabilistic programming.</p>
<p>Many of the most successful inference techniques (e.g. Hamiltonian Monte Carlo or Stochastic Variational Inference) harness gradients of the so-called density function, which therefore needs to be differentiable at least almost everywhere. We resolve a question posed by Hongseok Yang by demonstrating the following: densities of almost surely terminating programs are differentiable almost everywhere.</p>
<p>Having established this property necessary for the correctness of gradient-based inference algorithms, we investigate variational inference, which frames posterior inference as an optimisation problem, in more detail. The dominant approach for stochastic optimisation in practice is stochastic gradient descent. In particular, a variant using the so-called reparameterisation gradient estimator exhibits low variance, resulting in fast convergence in a traditional statistics setting. Unfortunately, although having measure 0, discontinuities can compromise the correctness of this approach.</p>
<p>Therefore, we propose a smoothed interpretation parameterised by an accuracy coefficient and present type systems establishing technical pre-conditions. Thus, we can prove stochastic gradient descent with the reparameterisation gradient estimator to be correct when applied to the smoothed problem. Besides, via a uniform convergence result, we can solve the original problem up to any error tolerance by choosing an accuracy coefficient suitably.</p>
<p>Furthermore, rather than fixing an accuracy coefficient in advance (limiting the quality of the final solution), we propose a novel variant of stochastic gradient descent, Diagonalisation Stochastic Gradient Descent, which progressively enhances the accuracy of the smoothed approximation during optimisation, and we prove convergence to stationary points of the unsmoothed (original) objective.</p>
<p>An empirical evaluation reveals benefits of our approaches over the state of the art: our approaches are simple, fast and attain orders of magnitude reduction in work- normalised variance. Besides, Diagonalisation Stochastic Gradient Descent is more stable than standard stochastic gradient descent for a fixed-accuracy smoothing.</p>
<p>Finally, we show unbiasedness of the reparameterisation gradient estimator for continuous but non-differentiable models, and we propose a method based on higher-order logic to establish continuity in the presence of conditionals. We provide a sound and complete reduction for verifying continuity of models to a satisfiability problem, and we propose novel efficient randomised decision procedures.</p> |
first_indexed | 2024-04-23T08:25:42Z |
format | Thesis |
id | oxford-uuid:76fbdc5e-37af-4254-a41b-25c67f66fdb0 |
institution | University of Oxford |
language | English |
last_indexed | 2024-04-23T08:25:42Z |
publishDate | 2023 |
record_format | dspace |
spelling | oxford-uuid:76fbdc5e-37af-4254-a41b-25c67f66fdb02024-04-16T16:17:15ZFast and correct variational inference for probabilistic programming: Differentiability, reparameterisation and smoothingThesishttp://purl.org/coar/resource_type/c_db06uuid:76fbdc5e-37af-4254-a41b-25c67f66fdb0Variational InferenceProbabilistic ProgrammingEnglishHyrax Deposit2023Wagner, DOng, L<p>Probabilistic programming is an innovative programming paradigm for posing and automatically solving Bayesian inference problems. In this thesis, we study the foundations of fast yet correct inference for probabilistic programming.</p> <p>Many of the most successful inference techniques (e.g. Hamiltonian Monte Carlo or Stochastic Variational Inference) harness gradients of the so-called density function, which therefore needs to be differentiable at least almost everywhere. We resolve a question posed by Hongseok Yang by demonstrating the following: densities of almost surely terminating programs are differentiable almost everywhere.</p> <p>Having established this property necessary for the correctness of gradient-based inference algorithms, we investigate variational inference, which frames posterior inference as an optimisation problem, in more detail. The dominant approach for stochastic optimisation in practice is stochastic gradient descent. In particular, a variant using the so-called reparameterisation gradient estimator exhibits low variance, resulting in fast convergence in a traditional statistics setting. Unfortunately, although having measure 0, discontinuities can compromise the correctness of this approach.</p> <p>Therefore, we propose a smoothed interpretation parameterised by an accuracy coefficient and present type systems establishing technical pre-conditions. Thus, we can prove stochastic gradient descent with the reparameterisation gradient estimator to be correct when applied to the smoothed problem. Besides, via a uniform convergence result, we can solve the original problem up to any error tolerance by choosing an accuracy coefficient suitably.</p> <p>Furthermore, rather than fixing an accuracy coefficient in advance (limiting the quality of the final solution), we propose a novel variant of stochastic gradient descent, Diagonalisation Stochastic Gradient Descent, which progressively enhances the accuracy of the smoothed approximation during optimisation, and we prove convergence to stationary points of the unsmoothed (original) objective.</p> <p>An empirical evaluation reveals benefits of our approaches over the state of the art: our approaches are simple, fast and attain orders of magnitude reduction in work- normalised variance. Besides, Diagonalisation Stochastic Gradient Descent is more stable than standard stochastic gradient descent for a fixed-accuracy smoothing.</p> <p>Finally, we show unbiasedness of the reparameterisation gradient estimator for continuous but non-differentiable models, and we propose a method based on higher-order logic to establish continuity in the presence of conditionals. We provide a sound and complete reduction for verifying continuity of models to a satisfiability problem, and we propose novel efficient randomised decision procedures.</p> |
spellingShingle | Variational Inference Probabilistic Programming Wagner, D Fast and correct variational inference for probabilistic programming: Differentiability, reparameterisation and smoothing |
title | Fast and correct variational inference for probabilistic programming: Differentiability, reparameterisation and smoothing |
title_full | Fast and correct variational inference for probabilistic programming: Differentiability, reparameterisation and smoothing |
title_fullStr | Fast and correct variational inference for probabilistic programming: Differentiability, reparameterisation and smoothing |
title_full_unstemmed | Fast and correct variational inference for probabilistic programming: Differentiability, reparameterisation and smoothing |
title_short | Fast and correct variational inference for probabilistic programming: Differentiability, reparameterisation and smoothing |
title_sort | fast and correct variational inference for probabilistic programming differentiability reparameterisation and smoothing |
topic | Variational Inference Probabilistic Programming |
work_keys_str_mv | AT wagnerd fastandcorrectvariationalinferenceforprobabilisticprogrammingdifferentiabilityreparameterisationandsmoothing |