A probabilistic method for emerging topic tracking in Microblog stream

被引:100
作者
Huang, Jiajia [1 ]
Peng, Min [1 ]
Wang, Hua [2 ]
Cao, Jinli [3 ]
Gao, Wang [1 ]
Zhang, Xiuzhen [4 ]
机构
[1] Wuhan Univ, State Key Lab Software Engn, Wuhan 430072, Peoples R China
[2] Victoria Univ, Ctr Appl Informat, Melbourne, Vic 3001, Australia
[3] La Trobe Univ, Comp Sci & Comp Engn, Bundoora, Vic 3086, Australia
[4] RMIT Univ, Sch CS&IT, GPO Box 2476, Melbourne, Vic 3001, Australia
来源
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS | 2017年 / 20卷 / 02期
基金
美国国家科学基金会;
关键词
Microblog stream; Emerging topic; LWLR; Topic evolution; Optimization problem;
D O I
10.1007/s11280-016-0390-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Microblog is a popular and open platform for discovering and sharing the latest news about social issues and daily life. The quickly-updated microblog streams make it urgent to develop an effective tool to monitor such streams. Emerging topic tracking is one of such tools to reveal what new events are attracting the most online attention at present. However, due to the fast changing, high noise and short length of the microblog feeds, two challenges should be addressed in emerging topic tracking. One is the problem of detecting emerging topics early, long before they become hot, and the other is how to effectively monitor evolving topics over time. In this study, we propose a novel emerging topics tracking method, which aligns emerging word detection from temporal perspective with coherent topic mining from spatial perspective. Specifically, we first design a metric to estimate word novelty and fading based on local weighted linear regression (LWLR), which can highlight the word novelty of expressing an emerging topic and suppress the word novelty of expressing an existing topic. We then track emerging topics by leveraging topic novelty and fading probabilities, which are learnt by designing and solving an optimization problem. We evaluate our method on a microblog stream containing over one million feeds. Experimental results show the promising performance of the proposed method in detecting emerging topic and tracking topic evolution over time on both effectiveness and efficiency.
引用
收藏
页码:325 / 350
页数:26
相关论文
共 42 条
[1]   On-Line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking [J].
AlSumait, Loulwah ;
Barbara, Daniel ;
Domeniconi, Carlotta .
ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, :3-12
[2]  
[Anonymous], 2008, P 2008 INT C WEB SEA
[3]  
[Anonymous], 2012, P COLING 2012
[4]  
[Anonymous], 2013, P 7 ACM C RECOMMENDE
[5]  
[Anonymous], 2011, P INT AAAI C WEB SOC
[6]  
[Anonymous], 2015, P 24 ACM INT C INF K
[7]   Serglycin-deficient cytotoxic T lymphocytes display defective secretory granule maturation and granzyme B storage [J].
Grujic, M ;
Braga, T ;
Lukinius, A ;
Eloranta, ML ;
Knight, SD ;
Pejler, G ;
Åbrink, M .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2005, 280 (39) :33411-33418
[8]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[9]  
Blei DM., 2006, P 23 INT C MACH LEAR, P113, DOI DOI 10.1145/1143844.1143859
[10]  
Boyd S., 2011, FOUND TRENDS MACH LE, V3, P1, DOI DOI 10.1561/2200000016