News Event Detection Using Random Walk with Restart

被引:0
作者
Chen, Lun-Chi [1 ]
Liao, I-En [1 ]
Chen, Chi-Hao [1 ]
机构
[1] Natl Chung Hsing Univ, Dept Comp Sci & Engn, Taichung, Taiwan
来源
INTELLIGENT SYSTEMS AND APPLICATIONS (ICS 2014) | 2015年 / 274卷
关键词
Random Walk with Restart; News Event Detection; Named Entity; Clustering;
D O I
10.3233/978-1-61499-484-8-611
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
News dataset is one of the most abundant data source for recording any event happening around people. For news event detection, people usually need to collect the related news to explore major events manually. To explore major events in large news datasets is difficult due to the amount of data grows quickly with the rapid development of the Web and also an article of news with unstructured data. How to discover events from unstructured-like articles has become an important problem. In this paper, we propose an event detection algorithm based on five-dimensional named entity feature and random-walk with restart to achieve event detection in news articles with unstructured data. The first part of this algorithm is to categorize news term into five predefined named-entity by exploring the Web page of Wikipedia in order to generate more distinctive features of each news article. The second one is to aggregate the news articles by the similarity between news articles using random-walk with restart clustering algorithm. The experimental results show that the proposed algorithm is indeed effective. Especially it is also demonstrated that this algorithm provides better event detection quality than other approaches in terms of the ability of handling multi-event news articles.
引用
收藏
页码:611 / 620
页数:10
相关论文
共 13 条
[1]  
Beyer Mark., IMPORTANCE BIG DATA
[2]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[3]  
Cai BJ, 2011, IEEE SYS MAN CYBERN, P2162, DOI 10.1109/ICSMC.2011.6083997
[4]  
Chang HC, 2005, THIRD INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS, VOL 1, PROCEEDINGS, P419
[5]  
Chen K. J., 2006, CKIP
[6]   The Google similarity distance [J].
Cilibrasi, Rudi L. ;
Vitanyi, Paul M. B. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (03) :370-383
[7]  
Han J., 2006, DATA MINING, Vsecond, P614
[8]  
Kim TH, 2008, LECT NOTES COMPUT SC, V5304, P264
[9]  
Mihalcea Rada, 2008, P 6 INT C CIKM, P34
[10]  
Pan J-Y, 2004, P 10 ACM SIGKDD INT, P653