On smoothing and inference for topic models

Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and...

Fuld beskrivelse

Bibliografiske detaljer
Main Authors:	Asuncion, A, Welling, M, Smyth, P, Teh, Y
Format:	Journal article
Sprog:	English
Udgivet:	2009

_version_	1826295130728955904
author	Asuncion, A Welling, M Smyth, P Teh, Y
author_facet	Asuncion, A Welling, M Smyth, P Teh, Y
author_sort	Asuncion, A
collection	OXFORD
description	Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and this variety motivates the need for careful empirical comparisons. In this paper, we highlight the close connections between these approaches. We find that the main differences are attributable to the amount of smoothing applied to the counts. When the hyperparameters are optimized, the differences in performance among the algorithms diminish significantly. The ability of these algorithms to achieve solutions of comparable accuracy gives us the freedom to select computationally efficient approaches. Using the insights gained from this comparative study, we show how accurate topic models can be learned in several seconds on text corpora with thousands of documents.
first_indexed	2024-03-07T03:56:20Z
format	Journal article
id	oxford-uuid:c2fd0c46-213d-4dce-a6ae-4b0693ee761e
institution	University of Oxford
language	English
last_indexed	2024-03-07T03:56:20Z
publishDate	2009
record_format	dspace
spelling	oxford-uuid:c2fd0c46-213d-4dce-a6ae-4b0693ee761e2022-03-27T06:13:09ZOn smoothing and inference for topic modelsJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:c2fd0c46-213d-4dce-a6ae-4b0693ee761eEnglishSymplectic Elements at Oxford2009Asuncion, AWelling, MSmyth, PTeh, YLatent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and this variety motivates the need for careful empirical comparisons. In this paper, we highlight the close connections between these approaches. We find that the main differences are attributable to the amount of smoothing applied to the counts. When the hyperparameters are optimized, the differences in performance among the algorithms diminish significantly. The ability of these algorithms to achieve solutions of comparable accuracy gives us the freedom to select computationally efficient approaches. Using the insights gained from this comparative study, we show how accurate topic models can be learned in several seconds on text corpora with thousands of documents.
spellingShingle	Asuncion, A Welling, M Smyth, P Teh, Y On smoothing and inference for topic models
title	On smoothing and inference for topic models
title_full	On smoothing and inference for topic models
title_fullStr	On smoothing and inference for topic models
title_full_unstemmed	On smoothing and inference for topic models
title_short	On smoothing and inference for topic models
title_sort	on smoothing and inference for topic models
work_keys_str_mv	AT asunciona onsmoothingandinferencefortopicmodels AT wellingm onsmoothingandinferencefortopicmodels AT smythp onsmoothingandinferencefortopicmodels AT tehy onsmoothingandinferencefortopicmodels

On smoothing and inference for topic models

Lignende værker