Principled Selection of Hyperparameters in the Latent Dirichlet Allocation Model

被引:0
作者
George, Clint P. [1 ]
Doss, Hani [2 ]
机构
[1] Univ Florida, Inst Informat, Gainesville, FL 32611 USA
[2] Univ Florida, Dept Stat, Gainesville, FL 32611 USA
关键词
Empirical Bayes inference; latent Dirichlet allocation; Markov chain Monte Carlo; model selection; topic modelling; CHAIN MONTE-CARLO;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Latent Dirichlet Allocation (LDA) is a well known topic model that is often used to make inference regarding the properties of collections of text documents. LDA is a hierarchical Bayesian model, and involves a prior distribution on a set of latent topic variables. The prior is indexed by certain hyperparameters, and even though these have a large impact on inference, they are usually chosen either in an ad-hoc manner, or by applying an algorithm whose theoretical basis has not been firmly established. We present a method, based on a combination of Markov chain Monte Carlo and importance sampling, for estimating the maximum likelihood estimate of the hyperparameters. The method may be viewed as a computational scheme for implementation of an empirical Bayes analysis. It comes with theoretical guarantees, and a key feature of our approach is that we provide theoretically-valid error margins for our estimates. Experiments on both synthetic and real data show good performance of our methodology.
引用
收藏
页数:38
相关论文
共 35 条
[1]   An introduction to MCMC for machine learning [J].
Andrieu, C ;
de Freitas, N ;
Doucet, A ;
Jordan, MI .
MACHINE LEARNING, 2003, 50 (1-2) :5-43
[2]  
[Anonymous], 2002, MALLET: A machine learning for language toolkit
[3]  
[Anonymous], 2008, THESIS U CAMBRIDGE
[4]  
[Anonymous], 2016, MCMCSE MONTE CARLO S
[5]  
Asuncion A., 2009, C UNC ART INT UAI QU, P27, DOI DOI 10.1080/10807030390248483
[6]  
Bergstra J, 2012, J MACH LEARN RES, V13, P281
[7]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[8]  
Chen Z., 2015, THESIS
[9]  
Chen ZL, 2017, TECHNICAL REPORT
[10]   Gibbs sampling, exponential families and orthogonal polynomials [J].
Diaconis, Persi ;
Khare, Kshitij ;
Saloff-Coste, Laurent .
STATISTICAL SCIENCE, 2008, 23 (02) :151-178