Proximal Gradient Algorithms for Gaussian Variational Inference:Optimization in the Bures–Wasserstein Space

Variational inference (VI) seeks to approximate a target distribution π by an element of a tractable family of distributions. Of key interest in statistics and machine learning is Gaussian VI, which approximates π by minimizing the Kullback–Leibler (KL) divergence to π over the space of Gaussians. I...

Full description

Bibliographic Details
Main Author: Diao, Michael Ziyang
Other Authors: Moitra, Ankur
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/151664
_version_ 1826192440320589824
author Diao, Michael Ziyang
author2 Moitra, Ankur
author_facet Moitra, Ankur
Diao, Michael Ziyang
author_sort Diao, Michael Ziyang
collection MIT
description Variational inference (VI) seeks to approximate a target distribution π by an element of a tractable family of distributions. Of key interest in statistics and machine learning is Gaussian VI, which approximates π by minimizing the Kullback–Leibler (KL) divergence to π over the space of Gaussians. In this work, we develop the (Stochastic) Forward-Backward Gaussian Variational Inference (FB–GVI) algorithm to solve Gaussian VI. Our approach exploits the composite structure of the KL divergence, which can be written as the sum of a smooth term (the potential) and a non-smooth term (the entropy) over the Bures–Wasserstein (BW) space of Gaussians endowed with the Wasserstein distance. For our proposed algorithm, we obtain state-of-the-art convergence guarantees when π is log-smooth and log-concave, as well as the first convergence guarantees to first-order stationary solutions when π is only log-smooth. Additionally, in the setting where the potential admits a representation as the average of many smooth component functionals, we develop and analyze a variance-reduced extension to (Stochastic) FB–GVI with improved complexity guarantees.
first_indexed 2024-09-23T09:13:05Z
format Thesis
id mit-1721.1/151664
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T09:13:05Z
publishDate 2023
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1516642023-08-01T04:13:17Z Proximal Gradient Algorithms for Gaussian Variational Inference:Optimization in the Bures–Wasserstein Space Diao, Michael Ziyang Moitra, Ankur Chewi, Sinho Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Variational inference (VI) seeks to approximate a target distribution π by an element of a tractable family of distributions. Of key interest in statistics and machine learning is Gaussian VI, which approximates π by minimizing the Kullback–Leibler (KL) divergence to π over the space of Gaussians. In this work, we develop the (Stochastic) Forward-Backward Gaussian Variational Inference (FB–GVI) algorithm to solve Gaussian VI. Our approach exploits the composite structure of the KL divergence, which can be written as the sum of a smooth term (the potential) and a non-smooth term (the entropy) over the Bures–Wasserstein (BW) space of Gaussians endowed with the Wasserstein distance. For our proposed algorithm, we obtain state-of-the-art convergence guarantees when π is log-smooth and log-concave, as well as the first convergence guarantees to first-order stationary solutions when π is only log-smooth. Additionally, in the setting where the potential admits a representation as the average of many smooth component functionals, we develop and analyze a variance-reduced extension to (Stochastic) FB–GVI with improved complexity guarantees. M.Eng. 2023-07-31T19:57:22Z 2023-07-31T19:57:22Z 2023-06 2023-06-06T16:35:08.204Z Thesis https://hdl.handle.net/1721.1/151664 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Diao, Michael Ziyang
Proximal Gradient Algorithms for Gaussian Variational Inference:Optimization in the Bures–Wasserstein Space
title Proximal Gradient Algorithms for Gaussian Variational Inference:Optimization in the Bures–Wasserstein Space
title_full Proximal Gradient Algorithms for Gaussian Variational Inference:Optimization in the Bures–Wasserstein Space
title_fullStr Proximal Gradient Algorithms for Gaussian Variational Inference:Optimization in the Bures–Wasserstein Space
title_full_unstemmed Proximal Gradient Algorithms for Gaussian Variational Inference:Optimization in the Bures–Wasserstein Space
title_short Proximal Gradient Algorithms for Gaussian Variational Inference:Optimization in the Bures–Wasserstein Space
title_sort proximal gradient algorithms for gaussian variational inference optimization in the bures wasserstein space
url https://hdl.handle.net/1721.1/151664
work_keys_str_mv AT diaomichaelziyang proximalgradientalgorithmsforgaussianvariationalinferenceoptimizationinthebureswassersteinspace