Variational maximization-maximization of Bayesian mixture models and application to unsupervised image classification

This thesis mainly propose variational inference for Bayesian mixture models and their applications to solve machine learning problems. The mixture models addressed are the Gaussian mixture model (GMM), Dirichlet process mixture (DPM), the sparse coding based Gaussian mixture model (sGMM) and the Fi...

Full description

Bibliographic Details
Main Author: Lim, Kart-Leong
Other Authors: Wang Han
Format: Thesis
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/73199
Description
Summary:This thesis mainly propose variational inference for Bayesian mixture models and their applications to solve machine learning problems. The mixture models addressed are the Gaussian mixture model (GMM), Dirichlet process mixture (DPM), the sparse coding based Gaussian mixture model (sGMM) and the Field-of-Gaussian (FoG) mixture model. In most mixture models, when using a Bayesian approach, usually an inherent problem will arise from analytically intractable posterior distributions of the mixture models. Most recent works either apply Gibbs sampling or variational inference to solve this problem. Variational inference rely on two assumptions mainly the mean field theory and conjugate priors. Despite proven faster convergence than Gibbs sampling, the learning algorithm of variational inference is not computationally efficient. The learning of variational inference can be widely seen as first estimating the class assignment variable and then applying it to estimate parameters of the mixture model. The estimate is usually performed by computing the expectations of the prior models. However, learning is not exclusive to expectation. Welling and Kurihara saw other possible configurations that use different combinations of maximization or expectation for the estimation. For instance, the learner in variational inference is generalized under the expectation-expectation (EE) algorithm. Inspired by this, another variant of EE known as the maximization-maximization (MM) algorithm is proposed for the Bayesian mixture models mentioned earlier. However, MM is not without issue. Firstly, it is very rare to find any theoretical study comparing MM to EE. Secondly, the computational efficiency and the accuracy of MM is rarely compared to EE. Hence, it is difficult to convince the use of MM over a mainstream learner such as EE or even Gibbs sampling. In this thesis, a revisit to the learning objective of EE on a Bayesian GMM case is made with comparison to the MM objective. In Bayesian nonparametrics, variational inference is not the mainstream in fast algorithms for DPM mainly due to high computational cost. Instead, most fast algorithms are largely based on MAP estimation of Gibbs sampling probabilities. However, they usually face intractable posterior and typically degenerate the conditional likelihood to overcome the inefficiency. Inspired by fast DPM algorithms, the MM algorithm is proposed for approximating expectations of variational posteriors for DPM. The proposed method outperforms variational inference in terms of clustering accuracy, model selection and has comparable computational efficiency to fast DPM algorithms. The Sparse Coding based Fisher Vector extends traditional GMM based Fisher Vector with a sparse term. In the original work, an off-the-shelf Sparse Coding solver is required. From a statistical perspective, an off-the-shelf solver may appear as a black-box. A more elegant way is to use a probablistic model to learn Sparse Coding. In this thesis a new GMM known as the Sparse Coding based GMM (sGMM) is proposed. It differs from standard GMM by an additional sparse coefficient hidden variable. Variational inference using the MM algorithm is then proposed to perform parameter estimation of sGMM. These parameters are then used to compute a Sparse Coding based Fisher Vector for image representation. Most unsupervised learners such as K-means, GMM or Sparse Coding do not use structure information found from the neighbourhood of an image patch for parameter estimation. A recent image prior technique combine unsupervised learning with Markov Random Field (MRF) by replacing the MRF potential with an unsupervised learner. It uses a neighbourhood of image patches for learning the parameters of the unsupervised learner, thus capturing image structure information. In this thesis, the above image prior technique is applied to the GMM parameter estimation so that the learnt model is more expressive than GMM learnt from a single image patch alone.