Stochastic expectation maximization with variance reduction

Expectation-Maximization (EM) is a popular tool for learning latent variable models, but the vanilla batch EM does not scale to large data sets because the whole data set is needed at every E-step. Stochastic Expectation Maximization (sEM) reduces the cost of E-step by stochastic approximation. Howe...

Volledige beschrijving

Bibliografische gegevens
Hoofdauteurs: Chen, J, Zhu, J, Teh, Y, Zhang, T
Formaat: Conference item
Gepubliceerd in: Massachusetts Institute of Technology Press 2018
_version_ 1826301242606878720
author Chen, J
Zhu, J
Teh, Y
Zhang, T
author_facet Chen, J
Zhu, J
Teh, Y
Zhang, T
author_sort Chen, J
collection OXFORD
description Expectation-Maximization (EM) is a popular tool for learning latent variable models, but the vanilla batch EM does not scale to large data sets because the whole data set is needed at every E-step. Stochastic Expectation Maximization (sEM) reduces the cost of E-step by stochastic approximation. However, sEM has a slower asymptotic convergence rate than batch EM, and requires a decreasing sequence of step sizes, which is difficult to tune. In this paper, we propose a variance reduced stochastic EM (sEM-vr) algorithm inspired by variance reduced stochastic gradient descent algorithms. We show that sEM-vr has the same exponential asymptotic convergence rate as batch EM. Moreover, sEM-vr only requires a constant step size to achieve this rate, which alleviates the burden of parameter tuning. We compare sEM-vr with batch EM, sEM and other algorithms on Gaussian mixture models and probabilistic latent semantic analysis, and sEM-vr converges significantly faster than these baselines.
first_indexed 2024-03-07T05:29:26Z
format Conference item
id oxford-uuid:e1b7d53c-8ac5-42cc-a12d-47abdd363c8c
institution University of Oxford
last_indexed 2024-03-07T05:29:26Z
publishDate 2018
publisher Massachusetts Institute of Technology Press
record_format dspace
spelling oxford-uuid:e1b7d53c-8ac5-42cc-a12d-47abdd363c8c2022-03-27T09:56:16ZStochastic expectation maximization with variance reductionConference itemhttp://purl.org/coar/resource_type/c_5794uuid:e1b7d53c-8ac5-42cc-a12d-47abdd363c8cSymplectic Elements at OxfordMassachusetts Institute of Technology Press2018Chen, JZhu, JTeh, YZhang, TExpectation-Maximization (EM) is a popular tool for learning latent variable models, but the vanilla batch EM does not scale to large data sets because the whole data set is needed at every E-step. Stochastic Expectation Maximization (sEM) reduces the cost of E-step by stochastic approximation. However, sEM has a slower asymptotic convergence rate than batch EM, and requires a decreasing sequence of step sizes, which is difficult to tune. In this paper, we propose a variance reduced stochastic EM (sEM-vr) algorithm inspired by variance reduced stochastic gradient descent algorithms. We show that sEM-vr has the same exponential asymptotic convergence rate as batch EM. Moreover, sEM-vr only requires a constant step size to achieve this rate, which alleviates the burden of parameter tuning. We compare sEM-vr with batch EM, sEM and other algorithms on Gaussian mixture models and probabilistic latent semantic analysis, and sEM-vr converges significantly faster than these baselines.
spellingShingle Chen, J
Zhu, J
Teh, Y
Zhang, T
Stochastic expectation maximization with variance reduction
title Stochastic expectation maximization with variance reduction
title_full Stochastic expectation maximization with variance reduction
title_fullStr Stochastic expectation maximization with variance reduction
title_full_unstemmed Stochastic expectation maximization with variance reduction
title_short Stochastic expectation maximization with variance reduction
title_sort stochastic expectation maximization with variance reduction
work_keys_str_mv AT chenj stochasticexpectationmaximizationwithvariancereduction
AT zhuj stochasticexpectationmaximizationwithvariancereduction
AT tehy stochasticexpectationmaximizationwithvariancereduction
AT zhangt stochasticexpectationmaximizationwithvariancereduction