Supervised labeled latent Dirichlet allocation for document categorization

被引:14
作者
Li, Ximing [1 ,2 ]
Ouyang, Jihong [1 ,2 ]
Zhou, Xiaotang [1 ,2 ]
Lu, You [1 ,2 ]
Liu, Yanhui [1 ,2 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130023, Peoples R China
[2] Jilin Univ, Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130023, Peoples R China
关键词
Supervised; Topic modeling; Latent Dirichlet allocation; Multi-label classification; MODEL;
D O I
10.1007/s10489-014-0595-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, supervised topic modeling approaches have received considerable attention. However, the representative labeled latent Dirichlet allocation (L-LDA) method has a tendency to over-focus on the pre-assigned labels, and does not give potentially lost labels and common semantics sufficient consideration. To overcome these problems, we propose an extension of L-LDA, namely supervised labeled latent Dirichlet allocation (SL-LDA), for document categorization. Our model makes two fundamental assumptions, i.e., Prior 1 and Prior 2, that relax the restriction of label sampling and extend the concept of topics. In this paper, we develop a Gibbs expectation-maximization algorithm to learn the SL-LDA model. Quantitative experimental results demonstrate that SL-LDA is competitive with state-of-the-art approaches on both single-label and multi-label corpora.
引用
收藏
页码:581 / 593
页数:13
相关论文
共 24 条
[1]   An introduction to MCMC for machine learning [J].
Andrieu, C ;
de Freitas, N ;
Doucet, A ;
Jordan, MI .
MACHINE LEARNING, 2003, 50 (1-2) :5-43
[2]  
[Anonymous], 2005, PARAMETER ESTIMATION
[3]  
[Anonymous], 2009, P 2009 C EMPIRICAL M
[4]  
[Anonymous], 2009, P 26 ANN INT C MACH, DOI DOI 10.1145/1553374.1553535
[5]  
[Anonymous], 2008, Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), DOI DOI 10.1145/1401890.1401939
[6]  
[Anonymous], 2009, Advances in neural information processing systems Vol, DOI DOI 10.1109/TPAMI.2015.2456899
[7]  
[Anonymous], NEURAL INFORM PROCES
[8]   A CORRELATED TOPIC MODEL OF SCIENCE [J].
Blei, David M. ;
Lafferty, John D. .
ANNALS OF APPLIED STATISTICS, 2007, 1 (01) :17-35
[9]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[10]   UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization [J].
Choo, Jaegul ;
Lee, Changhyun ;
Reddy, Chandan K. ;
Park, Haesun .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2013, 19 (12) :1992-2001