Topic Detection in Twitter Based on Label Propagation Model

被引:1
作者
Huang, Dongxu [1 ]
Mu, Dejun [1 ]
机构
[1] Northwest Polytech Univ, Sch Automat, Xian, Peoples R China
来源
PROCEEDINGS OF THIRTEENTH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE, (DCABES 2014) | 2014年
关键词
topic detection; twitter; cluster algorithm; label propagation model;
D O I
10.1109/DCABES.2014.23
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Many kinds of huge amount of tweets about real-world events are generated everyday in Twitter. However, the disorganization messages required to be classified by topics and events are one of challenges to get knowledge effectively. To solve the problem, we propose a novel method that combines the cluster algorithm with label propagation algorithm to detect topics in twitter. First, we use canopy cluster algorithm to cluster tweets, canopy cluster algorithm could divides a tweet into different clusters, and the tweet which only belongs to one cluster will be labeled. Second, the mechanism of label propagation is used to label the tweets that in the overlapping of different clusters. In order to evaluate our algorithm, we use two baseline algorithms, LDA (Latent Dirichlet Allocation) and Single-Pass cluster algorithm. We apply three algorithms on tweet dataset with three topics and some noisy data, and experiment results show our method outperforms other algorithms on precision and recall rate.
引用
收藏
页码:97 / 101
页数:5
相关论文
共 17 条
  • [1] [Anonymous], 2000, MATRIX ANAL APPL LIN
  • [2] [Anonymous], 2010, Proceedings of the 2010 international conference on Management of data
  • [3] [Anonymous], 2009, P 17 ACM SIGSP INT C
  • [4] Becker Hila, 2011, Icwsm, P438
  • [5] Blei D.M., 2006, INT C MACHINE LEARNI, DOI DOI 10.1145/1143844.1143859
  • [6] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [7] Diao Q., 2012, P 50 ANN M ASS COMP, V1, P536
  • [8] Hao Tu, 2012, 2012 International Conference on Computer Science and Service System (CSSS), P738, DOI 10.1109/CSSS.2012.189
  • [9] Hong L., 2010, P 1 WORKSH SOC MED A, P80, DOI DOI 10.1145/1964858.1964870
  • [10] Topic and Opinion Classification based Information Credibility Analysis on Twitter
    Ikegami, Yukino
    Kawai, Kenta
    Namihira, Yoshimi
    Tsuruta, Setsuo
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 4676 - 4681