Variational Bayes estimation of hierarchical Dirichlet-multinomial mixtures for text clustering

被引:0
作者
Massimo Bilancia
Michele Di Nanni
Fabio Manca
Gianvito Pio
机构
[1] University of Bari Aldo Moro,Department of Precision and Regenerative Medicine and Ionian Area (DiMePRe
[2] Policlinic University Hospital,J)
[3] EY Business and Technology Solution,Department of Education, Psychology, Communication (ForPsiCom)
[4] University of Bari Aldo Moro,Department of Computer Science
[5] Palazzo Chiaia - Napolitano,undefined
[6] University of Bari Aldo Moro,undefined
来源
Computational Statistics | 2023年 / 38卷
关键词
Text clustering; Finite mixture models; Dirichlet-multinomial distribution; Bayesian hierarchical modelling; Variational inference;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we formulate a hierarchical Bayesian version of the Mixture of Unigrams model for text clustering and approach its posterior inference through variational inference. We compute the explicit expression of the variational objective function for our hierarchical model under a mean-field approximation. We then derive the update equations of a suitable algorithm based on coordinate ascent to find local maxima of the variational target, and estimate the model parameters through the optimized variational hyperparameters. The advantages of variational algorithms over traditional Markov Chain Monte Carlo methods based on iterative posterior sampling are also discussed in detail.
引用
收藏
页码:2015 / 2051
页数:36
相关论文
共 77 条
[1]  
Anderlucci L(2020)Mixtures of Dirichlet-multinomial distributions for supervised and unsupervised classification of short text data Adv Data Anal Classif 14 759-770
[2]  
Viroli C(1994)Automated learning of decision rules for text categorization ACM Trans Inf Syst 12 233-251
[3]  
Apté C(2015)EM for mixtures. Inizialiation requires special care Stat Comput 25 713-726
[4]  
Damerau F(2012)Slope heuristics: overview and implementation Stat Comput 22 455-470
[5]  
Weiss SM(2021)Accurately computing the log-sum-exp and softmax functions IMA J Numer Anal 41 2311-2330
[6]  
Baudry JP(2012)Probabilistic topic models Commun ACM 55 77-84
[7]  
Celeux G(2007)A correlated topic model of science Ann Appl Stat 3 993-1022
[8]  
Baudry JP(2003)Latent Dirichlet allocation J Mach Learn Res 112 859-877
[9]  
Maugis C(2017)Variational inference: a review for statisticians J Am Stat Assoc 95 957-970
[10]  
Michel B(2000)Computational and inferential difficulties with mixture posterior distributions J Am Stat Assoc 83 173-175