Proximal Gradient Algorithms for Gaussian Variational Inference:Optimization in the Bures–Wasserstein Space
Variational inference (VI) seeks to approximate a target distribution π by an element of a tractable family of distributions. Of key interest in statistics and machine learning is Gaussian VI, which approximates π by minimizing the Kullback–Leibler (KL) divergence to π over the space of Gaussians. I...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2023
|
Online Access: | https://hdl.handle.net/1721.1/151664 |
_version_ | 1826192440320589824 |
---|---|
author | Diao, Michael Ziyang |
author2 | Moitra, Ankur |
author_facet | Moitra, Ankur Diao, Michael Ziyang |
author_sort | Diao, Michael Ziyang |
collection | MIT |
description | Variational inference (VI) seeks to approximate a target distribution π by an element of a tractable family of distributions. Of key interest in statistics and machine learning is Gaussian VI, which approximates π by minimizing the Kullback–Leibler (KL) divergence to π over the space of Gaussians. In this work, we develop the (Stochastic) Forward-Backward Gaussian Variational Inference (FB–GVI) algorithm to solve Gaussian VI. Our approach exploits the composite structure of the KL divergence, which can be written as the sum of a smooth term (the potential) and a non-smooth term (the entropy) over the Bures–Wasserstein (BW) space of Gaussians endowed with the Wasserstein distance. For our proposed algorithm, we obtain state-of-the-art convergence guarantees when π is log-smooth and log-concave, as well as the first convergence guarantees to first-order stationary solutions when π is only log-smooth. Additionally, in the setting where the potential admits a representation as the average of many smooth component functionals, we develop and analyze a variance-reduced extension to (Stochastic) FB–GVI with improved complexity guarantees. |
first_indexed | 2024-09-23T09:13:05Z |
format | Thesis |
id | mit-1721.1/151664 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T09:13:05Z |
publishDate | 2023 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1516642023-08-01T04:13:17Z Proximal Gradient Algorithms for Gaussian Variational Inference:Optimization in the Bures–Wasserstein Space Diao, Michael Ziyang Moitra, Ankur Chewi, Sinho Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Variational inference (VI) seeks to approximate a target distribution π by an element of a tractable family of distributions. Of key interest in statistics and machine learning is Gaussian VI, which approximates π by minimizing the Kullback–Leibler (KL) divergence to π over the space of Gaussians. In this work, we develop the (Stochastic) Forward-Backward Gaussian Variational Inference (FB–GVI) algorithm to solve Gaussian VI. Our approach exploits the composite structure of the KL divergence, which can be written as the sum of a smooth term (the potential) and a non-smooth term (the entropy) over the Bures–Wasserstein (BW) space of Gaussians endowed with the Wasserstein distance. For our proposed algorithm, we obtain state-of-the-art convergence guarantees when π is log-smooth and log-concave, as well as the first convergence guarantees to first-order stationary solutions when π is only log-smooth. Additionally, in the setting where the potential admits a representation as the average of many smooth component functionals, we develop and analyze a variance-reduced extension to (Stochastic) FB–GVI with improved complexity guarantees. M.Eng. 2023-07-31T19:57:22Z 2023-07-31T19:57:22Z 2023-06 2023-06-06T16:35:08.204Z Thesis https://hdl.handle.net/1721.1/151664 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Diao, Michael Ziyang Proximal Gradient Algorithms for Gaussian Variational Inference:Optimization in the Bures–Wasserstein Space |
title | Proximal Gradient Algorithms for Gaussian Variational Inference:Optimization in the Bures–Wasserstein Space |
title_full | Proximal Gradient Algorithms for Gaussian Variational Inference:Optimization in the Bures–Wasserstein Space |
title_fullStr | Proximal Gradient Algorithms for Gaussian Variational Inference:Optimization in the Bures–Wasserstein Space |
title_full_unstemmed | Proximal Gradient Algorithms for Gaussian Variational Inference:Optimization in the Bures–Wasserstein Space |
title_short | Proximal Gradient Algorithms for Gaussian Variational Inference:Optimization in the Bures–Wasserstein Space |
title_sort | proximal gradient algorithms for gaussian variational inference optimization in the bures wasserstein space |
url | https://hdl.handle.net/1721.1/151664 |
work_keys_str_mv | AT diaomichaelziyang proximalgradientalgorithmsforgaussianvariationalinferenceoptimizationinthebureswassersteinspace |