Topic Detection from Microblog Based on Text Clustering and Topic Model Analysis

被引:4
作者
Huang, Siqi [1 ]
Yang, Yitao [1 ]
Li, Huakang [1 ]
Sun, Guozi [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, 66 Xin Mofan Rd, Nanjing 210003, Jiangsu, Peoples R China
来源
2014 ASIA-PACIFIC SERVICES COMPUTING CONFERENCE (APSCC) | 2014年
关键词
Microblog; topic detection; text clustering; LDA; LATENT SEMANTIC ANALYSIS;
D O I
10.1109/APSCC.2014.18
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper raises a Microblog topic detection method based on text clustering and topic model analysis. It solves the problem that the traditional topic detection method is mainly applicable for traditional media text, which is not very effective in handling sparse Microblog short texts. In consequence of the structural data of the Microblog, which exists rich inter-textual contextual information such as retweets, comments, user hashtag, embedded link URL, we first put forward a feature weight pre-processing method. We also use a clustering algorithm based on word vectors to enrich the feature information of the data. On this basis, we extend the conventional LDA (Latent Dirichlet allocation) topic model to extract the hot topics in the Microblog data. Compared with the traditional methods, the method raised in this paper is much more effective in the collected text corpus in Sina Microblog.
引用
收藏
页码:88 / 92
页数:5
相关论文
共 17 条
  • [1] [Anonymous], 2012, Health, DOI DOI 10.1371/JOURNAL.PONE.0083672
  • [2] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [3] Cataldi M., 2010, P 10 INT WORKSH MULT
  • [4] Dai XY, 2010, MACH LEARN CYB ICMLC, V6
  • [5] Unsupervised learning by probabilistic latent semantic analysis
    Hofmann, T
    [J]. MACHINE LEARNING, 2001, 42 (1-2) : 177 - 196
  • [6] Kaur K, 2012, INT J, V2
  • [7] An introduction to latent semantic analysis
    Landauer, TK
    Foltz, PW
    Laham, D
    [J]. DISCOURSE PROCESSES, 1998, 25 (2-3) : 259 - 284
  • [8] Li Jin, 2012, Journal of Computer Applications, V32, P2346, DOI 10.3724/SP.J.1087.2012.02346
  • [9] Lu R, 2010, EXTRACTING NEWS TOPI
  • [10] Ma B, 2013, J CHINESE INFORM PRO, V26, P121