Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence

被引:0
作者
Shao, Minglai [1 ]
Qin, Liangxi [1 ]
机构
[1] Guangxi Univ, Sch Comp Elect & Informat, Nanning 530004, Peoples R China
来源
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, KNOWLEDGE ENGINEERING AND INFORMATION ENGINEERING (SEKEIE 2014) | 2014年 / 114卷
关键词
topic model; LDA (Latent Dirichlet Allocation); !text type='JS']JS[!/text] (Jensen-Shannon) distance; word co-occurrence; similarity;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
LDA (Latent Dirichlet Allocation) topic model has been widely applied to text clustering owing to its efficient dimension reduction. The prevalent method is to model text set through LDA topic model, to make inference by Gibbs sampling, and to calculate text similarity with JS (Jensen-Shannon) distance. However, JS distance cannot distinguish semantic associations among text topics. For this defect, a new text similarity computing algorithm based on hidden topics model and word co-occurrence analysis is introduced. Tests are carried out to verify the clustering effect of this improved computing algorithm. Results show that this method can effectively improve text similarity computing result and text clustering accuracy.
引用
收藏
页码:199 / 203
页数:5
相关论文
共 13 条
[1]  
Chang Peng, 2009, RES TERMS COOCCURREN, P30
[2]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[3]  
2-9
[4]  
[耿焕同 Geng Huantong], 2006, [南京大学学报. 自然科学版, Journal of Nanjing University. Natural Sciences], V42, P156
[5]   Probabilistic latent semantic indexing [J].
Hofmann, T .
SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, :50-57
[6]  
Huang Bo, 2012, RES MICROBLOG TOPIC, P36
[7]  
Li Wen-Bo, 2008, Chinese Journal of Computers, V31, P620
[8]  
Phan Xuan-Hieu, 2008, Proceedings of the 17th international conference on World Wide Web, WWW '08, P91
[9]   Short text similarity based on probabilistic topics [J].
Quan, Xiaojun ;
Liu, Gang ;
Lu, Zhi ;
Ni, Xingliang ;
Wenyin, Liu .
KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 25 (03) :473-491
[10]  
Shi Jian-hong, 2014, Application Research of Computers, V31, P700, DOI 10.3969/j.issn.1001-3695.2014.03.014