Event Detection and Summarization Using Phrase Network

被引:5
作者
Melvin, Sara [1 ]
Yu, Wenchao [1 ]
Ju, Peng [1 ]
Young, Sean [2 ]
Wang, Wei [1 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
[2] Univ Calif Los Angeles, Univ Calif Inst Predict Technol, Los Angeles, CA USA
来源
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT III | 2017年 / 10536卷
关键词
Event detection; Phrase network; Event summarization;
D O I
10.1007/978-3-319-71273-4_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Identifying events in real-time data streams such as Twitter is crucial for many occupations to make timely, actionable decisions. It is however extremely challenging because of the subtle difference between "events" and trending topics, the definitive rarity of these events, and the complexity of modern Internet's text data. Existing approaches often utilize topic modeling technique and keywords frequency to detect events on Twitter, which have three main limitations: (1) supervised and semi-supervised methods run the risk of missing important, breaking news events; (2) existing topic/event detection models are base on words, while the correlations among phrases are ignored; (3) many previous methods identify trending topics as events. To address these limitations, we propose the model, PhraseNet, an algorithm to detect and summarize events from tweets. To begin, all topics are defined as a clustering of high-frequency phrases extracted from text. All trending topics are then identified based on temporal spikes of the phrase cluster frequencies. PhraseNet thus filters out high-confidence events from other trending topics using number of peaks and variance of peak intensity. We evaluate PhraseNet on a three month duration of Twitter data and show the both the efficiency and the effectiveness of our approach.
引用
收藏
页码:89 / 101
页数:13
相关论文
共 21 条
[1]   Real Time Discovery of Dense Clusters in Highly Dynamic Graphs: Identifying Real World Events in Highly Dynamic Environments [J].
Agarwal, Manoj K. ;
Ramamritham, Krithi ;
Bhide, Manish .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (10) :980-991
[2]  
[Anonymous], 2010, Proceedings of the 2010 international conference on Management of data
[3]  
[Anonymous], 2012, P 18 ACM SIGKDD INT
[4]  
[Anonymous], 2011, J COMPUT SCI-NETH, DOI DOI 10.1016/j.jocs.2010.12.007
[5]  
[Anonymous], 2010, Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, DOI DOI 10.1145/1835804.1835922
[6]   Fast unfolding of communities in large networks [J].
Blondel, Vincent D. ;
Guillaume, Jean-Loup ;
Lambiotte, Renaud ;
Lefebvre, Etienne .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
[7]  
Chua FreddyChong Tat., 2013, ICWSM
[8]   Recurrent Marked Temporal Point Processes: Embedding Event History to Vector [J].
Du, Nan ;
Dai, Hanjun ;
Trivedi, Rakshit ;
Upadhyay, Utkarsh ;
Gomez-Rodriguez, Manuel ;
Song, Le .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :1555-1564
[9]   Scalable Topical Phrase Mining from Text Corpora [J].
El-Kishky, Ahmed ;
Song, Yanglei ;
Wang, Chi ;
Voss, Clare R. ;
Han, Jiawei .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 8 (03) :305-316
[10]  
Han JW, 2000, SIGMOD RECORD, V29, P1