Bursty event detection from microblog: a distributed and incremental approach

被引:21
作者
Li, Jianxin [1 ]
Wen, Jianfeng [1 ]
Tai, Zhenying [1 ]
Zhang, Richong [1 ]
Yu, Weiren [1 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing, Peoples R China
关键词
social network; event detection; temporal topic model; topic drifting;
D O I
10.1002/cpe.3657
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
As a new form of social media, microblogs (e.g., Twitter and Weibo) are playing an important role in people's daily life. With the rise in popularity and size of microblogs, there is a need for distributed approaches that can detect bursty event with low latency from the short-text data stream. In this paper, we propose a distributed and incremental temporal topic model for microblogs called Bursty Event dEtection (BEE+). BEE+ is able to detect bursty events from short-text dataset and model the temporal information. And BEE+ processes the post-stream incrementally to track the topic drifting of events over time. Therefore, the latent semantic indices are preserved from one time period to the next. In order to achieve real-time processing, we design a distributed execution framework based on Spark engine. To verify its ability to detect bursty event, we conduct experiments on a Weibo dataset of 6,360,125 posts. The results show that BEE+ can outperform the baselines for detecting the meaningful bursty events and track the topic drifting. Copyright (C) 2015 John Wiley & Sons, Ltd.
引用
收藏
页码:3115 / 3130
页数:16
相关论文
共 25 条
  • [1] [Anonymous], 2010, Proceedings of the 2010 international conference on Management of data
  • [2] [Anonymous], 2011, ICWSM
  • [3] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [4] Boyd-Graber J., 2008, P ADV NEURAL INFORM, P185
  • [5] Chen Y, 2013, SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, P43
  • [6] DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
  • [7] 2-9
  • [8] Diao Q, 2012, P 50 ANN M ASS COMP, V1, P536
  • [9] Gruber Amit, 2007, AISTATS, P163
  • [10] Probabilistic latent semantic indexing
    Hofmann, T
    [J]. SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 50 - 57