Incremental Clustering of News Reports

被引:22
作者
Azzopardi, Joel [1 ]
Staff, Christopher [1 ]
机构
[1] Univ Malta, Fac ICT, Msida 2080, MSD, Malta
关键词
clustering; news; event detection; incremental clustering;
D O I
10.3390/a5030364
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When an event occurs in the real world, numerous news reports describing this event start to appear on different news sites within a few minutes of the event occurrence. This may result in a huge amount of information for users, and automated processes may be required to help manage this information. In this paper, we describe a clustering system that can cluster news reports from disparate sources into event-centric clustersi.e., clusters of news reports describing the same event. A user can identify any RSS feed as a source of news he/she would like to receive and our clustering system can cluster reports received from the separate RSS feeds as they arrive without knowing the number of clusters in advance. Our clustering system was designed to function well in an online incremental environment. In evaluating our system, we found that our system is very good in performing fine-grained clustering, but performs rather poorly when performing coarser-grained clustering
引用
收藏
页码:364 / 378
页数:15
相关论文
共 25 条
[1]  
Arora R., 2005, ACM SE 43, P153
[2]  
Aslam J, 1999, PROCEEDINGS OF THE TENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P51
[3]  
Azzopardi J., 2012, 2012 IEEE Workshops of International Conference on Advanced Information Networking and Applications (WAINA), P809, DOI 10.1109/WAINA.2012.113
[4]  
Azzopardi J, 2012, ADV INTEL SOFT COMPU, V156, P69
[5]   AUTOMATIC DOCUMENT CLASSIFICATION [J].
BORKO, H ;
BERNICK, M .
JOURNAL OF THE ACM, 1963, 10 (02) :151-&
[6]  
Braun R.K., 2004, P TOP DET TRACK WORK
[7]  
Cardoso-Cachopo A., 2007, SAC 07, P844
[8]   Document classification based on support vector machine using a concept vector model [J].
Deng, Shuang ;
Peng, Hong .
2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, :473-+
[9]  
Gulli A, 2005, P 14 INT C WORLD WID, P880, DOI 10.1145/1062745.1062778
[10]  
Hearst M. A., 1996, SIGIR Forum, P76