Regularization, robustness and sparsity of probabilistic topic models

We propose a generalized probabilistic topic model of text corpora which can incorporate heuristics of Bayesian regularization, sampling, frequent parameters update, and robustness in any combinations. Wellknown models PLSA, LDA, CVB0, SWB, and many others can be considered as special cases of the p...

Full description

Bibliographic Details
Main Authors: Konstantin Vyacheslavovich Vorontsov, Anna Alexandrovna Potapenko
Format: Article
Language:Russian
Published: Institute of Computer Science 2012-12-01
Series:Компьютерные исследования и моделирование
Subjects:
Online Access:http://crm.ics.org.ru/uploads/crmissues/crm_2012_4/12403.pdf
_version_ 1818521239798415360
author Konstantin Vyacheslavovich Vorontsov
Anna Alexandrovna Potapenko
author_facet Konstantin Vyacheslavovich Vorontsov
Anna Alexandrovna Potapenko
author_sort Konstantin Vyacheslavovich Vorontsov
collection DOAJ
description We propose a generalized probabilistic topic model of text corpora which can incorporate heuristics of Bayesian regularization, sampling, frequent parameters update, and robustness in any combinations. Wellknown models PLSA, LDA, CVB0, SWB, and many others can be considered as special cases of the proposed broad family of models. We propose the robust PLSA model and show that it is more sparse and performs better that regularized models like LDA.
first_indexed 2024-12-11T01:48:24Z
format Article
id doaj.art-e08249df265942c8a5575feb34f6b1f5
institution Directory Open Access Journal
issn 2076-7633
2077-6853
language Russian
last_indexed 2024-12-11T01:48:24Z
publishDate 2012-12-01
publisher Institute of Computer Science
record_format Article
series Компьютерные исследования и моделирование
spelling doaj.art-e08249df265942c8a5575feb34f6b1f52022-12-22T01:24:50ZrusInstitute of Computer ScienceКомпьютерные исследования и моделирование2076-76332077-68532012-12-014469370610.20537/2076-7633-2012-4-4-693-7061950Regularization, robustness and sparsity of probabilistic topic modelsKonstantin Vyacheslavovich VorontsovAnna Alexandrovna PotapenkoWe propose a generalized probabilistic topic model of text corpora which can incorporate heuristics of Bayesian regularization, sampling, frequent parameters update, and robustness in any combinations. Wellknown models PLSA, LDA, CVB0, SWB, and many others can be considered as special cases of the proposed broad family of models. We propose the robust PLSA model and show that it is more sparse and performs better that regularized models like LDA.http://crm.ics.org.ru/uploads/crmissues/crm_2012_4/12403.pdftext analysistopic modelingprobabilistic latent semantic analysisEM-algorithmlatent Dirichlet allocationGibbs samplingBayesian regularizationperplexityrobusteness
spellingShingle Konstantin Vyacheslavovich Vorontsov
Anna Alexandrovna Potapenko
Regularization, robustness and sparsity of probabilistic topic models
Компьютерные исследования и моделирование
text analysis
topic modeling
probabilistic latent semantic analysis
EM-algorithm
latent Dirichlet allocation
Gibbs sampling
Bayesian regularization
perplexity
robusteness
title Regularization, robustness and sparsity of probabilistic topic models
title_full Regularization, robustness and sparsity of probabilistic topic models
title_fullStr Regularization, robustness and sparsity of probabilistic topic models
title_full_unstemmed Regularization, robustness and sparsity of probabilistic topic models
title_short Regularization, robustness and sparsity of probabilistic topic models
title_sort regularization robustness and sparsity of probabilistic topic models
topic text analysis
topic modeling
probabilistic latent semantic analysis
EM-algorithm
latent Dirichlet allocation
Gibbs sampling
Bayesian regularization
perplexity
robusteness
url http://crm.ics.org.ru/uploads/crmissues/crm_2012_4/12403.pdf
work_keys_str_mv AT konstantinvyacheslavovichvorontsov regularizationrobustnessandsparsityofprobabilistictopicmodels
AT annaalexandrovnapotapenko regularizationrobustnessandsparsityofprobabilistictopicmodels