Index Maintenance for Time-Travel Text Search

被引:0
作者
Anand, Avishek [1 ]
Bedathur, Srikanta [2 ]
Berberich, Klaus [1 ]
Schenkel, Ralf [3 ]
机构
[1] Max Planck Inst Informat, Saarbrucken, Germany
[2] IIIT Delhi, New Delhi, India
[3] Saarland Univ, Saarbrucken, Germany
来源
SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2012年
关键词
Time-Travel Text Search; Index Maintenance; Web Archives;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Time-travel text search enriches standard text search by temporal predicates, so that users of web archives can easily retrieve document versions that are considered relevant to a given keyword query and existed during a given time interval. Different index structures have been proposed to efficiently support time-travel text search. None of them, however, can easily be updated as the Web evolves and new document versions are added to the web archive. In this work, we describe a novel index structure that efficiently supports time-travel text search and can be maintained incrementally as new document versions are added to the web archive. Our solution uses a sharded index organization, bounds the number of spuriously read index entries per shard, and can be maintained using small in-memory buffers and append-only operations. We present experiments on two large-scale real-world datasets demonstrating that maintaining our novel index structure is an order of magnitude more efficient than periodically rebuilding one of the existing index structures, while query-processing performance is not adversely affected.
引用
收藏
页码:235 / 243
页数:9
相关论文
共 26 条
[1]  
Anand A., 2010, P C INF KNOWL MAN, P699
[2]  
Anand A, 2011, PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), P545
[3]  
ANICK PG, 1992, SIGIR 92 : PROCEEDINGS OF THE FIFTEENTH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P98
[4]  
Becker B., 1996, VLDB Journal, V5, P264, DOI 10.1007/s007780050028
[5]  
Berberich Klaus, 2007, 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P519, DOI 10.1145/1277741.1277831
[6]  
Broder Andrei Z, 2003, P 12 INT C INF KNOWL, P426
[7]   Syntactic clustering of the Web [J].
Broder, AZ ;
Glassman, SC ;
Manasse, MS ;
Zweig, G .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1997, 29 (8-13) :1157-1166
[8]   Hybrid index maintenance for contiguous inverted lists [J].
Buettcher, Stefan ;
Clarke, Charles L. A. .
INFORMATION RETRIEVAL, 2008, 11 (03) :175-207
[9]  
diaeresis>uttcher S. B<spacing, 2010, INFORM RETRIEVAL IMP
[10]  
Ding S, 2011, PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), P993