Use of topicality and information measures to improve document representation for story link detection

被引:0
作者
Shah, Chirag [1 ,3 ]
Eguchi, Koji [2 ,3 ]
机构
[1] Unive North Carolina, Chapel Hill, NC USA
[2] Kobe Univ, Kobe, Hyogo, Japan
[3] Natl Inst Informat, Tokyo 1018430, Japan
来源
ADVANCES IN INFORMATION RETRIEVAL | 2007年 / 4425卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Several information organization, access, and filtering systems can benefit from different kind of document representations than those used in traditional Information Retrieval (IR). Topic Detection and Tracking (TDT) is an example of such a domain. In this paper we demonstrate that traditional methods for term weighing does not capture topical information and this leads to inadequate representation of documents for TDT applications. We present various hypotheses regarding the factors that can help in improving the document representation for Story Link Detection (SLD) - a core task of TDT. These hypotheses are tested using various TDT corpora. From our experiments and analysis we found that in order to obtain a faithful representation of documents in TDT domain, we not only need to capture a term's importance in traditional IR sense, but also evaluate its topical behavior. Along with defining this behavior, we propose a novel measure that captures a term's importance at the corpus level as well as its discriminating power for topics. This new measure leads to a much better document representation as reflected by the significant improvements in the results.
引用
收藏
页码:393 / +
页数:2
相关论文
共 19 条
[1]  
ALLAN J, 1999, TOPIC BASED NOVELTY
[2]  
Allan J, 2002, TOPIC DETECTION TRAC
[3]  
Chengxiang Zhai, 2001, Proceedings of the 2001 ACM CIKM. Tenth International Conference on Information and Knowledge Management, P403, DOI 10.1145/502585.502654
[4]  
COLLINSTHOMPSON K, 2005, CIKM
[5]  
EICHMANN D, 2001, EXPT TRACKING DETECT
[6]  
FISCUS J, 2004, OVERVIEW TDT 2004 EV
[7]  
Fiscus JG, 2002, KLUW S INF, V12, P17
[8]  
GILLICK L, 1989, P ICASSP, P532
[9]   An efficient k-means clustering algorithm:: Analysis and implementation [J].
Kanungo, T ;
Mount, DM ;
Netanyahu, NS ;
Piatko, CD ;
Silverman, R ;
Wu, AY .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (07) :881-892
[10]  
Kelly D, 2004, LECT NOTES COMPUT SC, V2997, P27