Toward Faster Methods in Bayesian Unsupervised Learning

Many data analyses can be seen as discovering a latent set of traits in a population. For example, what are the themes, or topics, behind Wikipedia documents? To encode structural information in these unsupervised learning problems, such as the hierarchy among words, documents, and latent topics, on...

Full description

Bibliographic Details
Main Author: Nguyen, Tin D.
Other Authors: Broderick, Tamara
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/156597
Description
Summary:Many data analyses can be seen as discovering a latent set of traits in a population. For example, what are the themes, or topics, behind Wikipedia documents? To encode structural information in these unsupervised learning problems, such as the hierarchy among words, documents, and latent topics, one can use Bayesian probabilistic models. The application of Bayesian unsupervised learning faces three computational challenges. Firstly, existing works aim to speed up Bayesian inference via parallelism, but these methods struggle in Bayesian unsupervised learning due to the so-called “label-switching problem”. Secondly, in Bayesian nonparametrics for unsupervised learning, computers cannot learn the distribution over the countable infinity of random variables posited by the model in f inite time. Finally, to assess the generalizability of Bayesian conclusions, we might want to detect the posterior’s sensitivity to the removal of a very small amount of data, but checking this sensitivity directly takes an intractably long time. My thesis addresses the first two computational challenges, and establishes a first step in tackling the last one. I utilize a known representation of the probabilistic model to evade the label-switching problem: when parallel processors are available, I derive fast estimates of Bayesian posteriors in unsupervised learning. Generalizing existing works and providing more guidance, I derive accurate and easy-to-use finite approximations for infinite-dimensional priors. Lastly, I assess generalizability in supervised Bayesian models, which can be seen as a precursor to the models used in Bayesian unsupervised learning. In supervised models, I develop and test a computationally efficient tool to detect sensitivity regarding data removals for analyses based on MCMC.