Semi-supervised Document Clustering Based on Latent Dirichlet Allocation (LDA)

被引:2
作者
秦永彬
李解
黄瑞章
李晶
机构
[1] CollegeofComputerScienceandTechnology,GuizhouUniversity
关键词
latent Dirichlet allocation(LDA); semi-supervised learning; document clustering;
D O I
10.19884/j.1672-5220.2016.05.001
中图分类号
TP391.1 [文字信息处理];
学科分类号
081203 ; 0835 ;
摘要
To discover personalized document structure with the consideration of user preferences,user preferences were captured by limited amount of instance level constraints and given as interested and uninterested key terms.Develop a semi-supervised document clustering approach based on the latent Dirichlet allocation(LDA)model,namely,pLDA,guided by the user provided key terms.Propose a generalized Polya urn(GPU) model to integrate the user preferences to the document clustering process.A Gibbs sampler was investigated to infer the document collection structure.Experiments on real datasets were taken to explore the performance of pLDA.The results demonstrate that the pLDA approach is effective.
引用
收藏
页码:685 / 688
页数:4
相关论文
共 3 条
[1]   结合seeds集和LDA的半监督文本聚类算法 [J].
周萍 ;
秦永彬 ;
黄瑞章 .
计算机工程与设计, 2014, 35 (06) :1994-1998
[2]  
Semi-supervised model-based document clustering: A comparative study[J] . Shi Zhong.Machine Learning . 2006 (1)
[3]  
Gibbs sampling for Bayesian non‐conjugate and hierarchical models by using auxiliary variables[J] . P.Damlen,J.Wakefield,S.Walker.Journal of the Royal Statistical Society: Series B (Statistical Methodology) . 2002 (2)