Efficient online index construction for text databases

被引:25
作者
Lester, Nicholas [3 ]
Moffat, Alistair [1 ]
Zobel, Justin [2 ]
机构
[1] Univ Melbourne, Dept Comp Sci & Software Engn, Melbourne, Vic 3010, Australia
[2] RMIT Univ, Sch Comp Sci & Informat Technol, Melbourne, Vic 3001, Australia
[3] Microsoft Corp, Redmond, WA 98052 USA
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2008年 / 33卷 / 03期
关键词
algorithms; performance; index construction; index update; search engines; text indexing;
D O I
10.1145/1386118.1386125
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Inverted index structures are a core element of current text retrieval systems. They can be constructed quickly using offline approaches, in which one or more passes are made over a static set of input data, and, at the completion of the process, an index is available for querying. However, there are search environments in which even a small delay in timeliness cannot be tolerated, and the index must always be queryable and up to date. Here we describe and analyze a geometric partitioning mechanism for online index construction that provides a range of tradeoffs between costs, and can be adapted to different balances of insertion and querying operations. Detailed experimental results are provided that show the extent of these tradeoffs, and that these new methods can yield substantial savings in online indexing costs.
引用
收藏
页数:33
相关论文
共 37 条
[1]   Improved word-aligned binary compression for text indexing [J].
Anh, VN ;
Moffat, A .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (06) :857-861
[2]  
Baeza-Yates R.A., 1999, Modern Information Retrieval
[3]   Engineering a multi-purpose test collection for Web retrieval experiments [J].
Bailey, P ;
Craswell, N ;
Hawking, D .
INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (06) :853-871
[4]  
Bayer R., 1972, Acta Informatica, V1, P290, DOI 10.1007/BF00289509
[5]  
BILIRIS A, 1992, P ACM SIGMOD INT C M, P276
[6]  
BILIRIS A, 1992, P INT C DAT ENG, P301
[7]  
Brown E. W., 1994, P 20 INT C VER LARG, P192
[8]  
BUTCHER S, 2006, P ACM SIGIR INT C RE, P356
[9]  
BUTCHER S, 2006, P EUR C INF RETR ECI, P229
[10]  
BUTTCHER S, 2005, P 14 ACM INT C INF K, P317