Micro-Blog Topic Detection Method Based on BTM Topic Model and K-Means Clustering Algorithm

被引:32
作者
Li, Weijiang [1 ]
Feng, Yanming [1 ]
Li, Dongjun [2 ]
Yu, Zhengtao [1 ]
机构
[1] Kunming Univ Sci & Technol, Dept Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Soochow Univ, Jinan Qingqi Peugeot Motorcycle Co Ltd, R&D Dept, Jinan 250104, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
short text; topic model; topic discovery; K-means clustering;
D O I
10.3103/S0146411616040040
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The development of micro-blog, generating large-scale short texts, provides people with convenient communication. In the meantime, discovering topics from short texts genuinely becomes an intractable problem. It was hard for traditional topic model-to-model short texts, such as probabilistic latent semantic analysis (PLSA) and Latent Dirichlet Allocation (LDA). They suffered from the severe data sparsity when disposed short texts. Moreover, K-means clustering algorithm can make topics discriminative when datasets is intensive and the difference among topic documents is distinct. In this paper, BTM topic model is employed to process short texts - micro-blog data for alleviating the problem of sparsity. At the same time, we integrating K-means clustering algorithm into BTM (Biterm Topic Model) for topics discovery further. The results of experiments on Sina micro-blog short text collections demonstrate that our method can discover topics effectively.
引用
收藏
页码:271 / 277
页数:7
相关论文
共 15 条
[1]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[2]  
Han J.W., 2007, DATA MINING CONCEPTS, P263
[3]   Probabilistic latent semantic indexing [J].
Hofmann, T .
SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, :50-57
[4]   Topic Detection from Microblog Based on Text Clustering and Topic Model Analysis [J].
Huang, Siqi ;
Yang, Yitao ;
Li, Huakang ;
Sun, Guozi .
2014 ASIA-PACIFIC SERVICES COMPUTING CONFERENCE (APSCC), 2014, :88-92
[5]   Discovering Communities with Self-adaptive k Clustering in Microblog Data [J].
Huang, Ting ;
Peng, Dunlu ;
Cao, Lidong .
SECOND INTERNATIONAL CONFERENCE ON CLOUD AND GREEN COMPUTING / SECOND INTERNATIONAL CONFERENCE ON SOCIAL COMPUTING AND ITS APPLICATIONS (CGC/SCA 2012), 2012, :383-390
[6]  
[路荣 Lu Rong], 2012, [模式识别与人工智能, Pattern Recognition and Artificial Intelligence], V25, P382
[7]  
Mi W.L., 2014, COMPUT SYST APPL
[8]  
Qi X.Q., 2012, SCI PAP ONLINE
[9]  
Ramage D., 2010, P 4 INT AAAI C WEBLO, V10, P1
[10]  
Shengbing Liu, 2014, 2014 International Conference on Information Science, Electronics and Electrical Engineering (ISEEE), P1527, DOI 10.1109/InfoSEEE.2014.6946176