Query-based sampling of text databases

被引:170
作者
Callan, J
Connell, M
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Language Technol Inst, Pittsburgh, PA 15213 USA
[2] Univ Massachusetts, Dept Comp Sci, Ctr Intelligent Informat Retrieval, Amherst, MA 01003 USA
关键词
algorithms; design; experimentation; distributed information retrieval; query-based sampling; resource ranking; resource selection; server selection;
D O I
10.1145/382979.383040
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The proliferation of searchable text databases on corporate networks and the Internet causes a database selection problem for many people. Algorithms such as gGlOSS and CORI can automatically select which text databases to search for a given information need, but only if given a set of resource descriptions that accurately represent the contents of each database. The existing techniques for acquiring resource descriptions have significant limitations when used in wide-area networks controlled by many parties. This paper presents query-based sampling, a new technique for acquiring accurate resource descriptions. Query-based sampling does not require the cooperation of resource providers, nor does it require that resource providers use a particular search engine or representation technique. An extensive set of experimental results demonstrates that accurate resource descriptions are created, that computation and communication costs are reasonable, and that the resource descriptions do in fact enable accurate automatic database selection.
引用
收藏
页码:97 / 130
页数:34
相关论文
共 43 条
[1]  
ALLAN J, 1999, P 7 C TEXT RETR, P201
[2]  
[Anonymous], 1995, P 4 TREC
[3]  
[Anonymous], P 21 ANN INT ACM SIG
[4]  
[Anonymous], 1949, Human behaviour and the principle of least-effort
[5]  
[Anonymous], P 18 INT ACM SIGIR C
[6]  
Baumgarten C, 1997, PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P258, DOI 10.1145/278459.258585
[7]  
Callan J, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P479, DOI 10.1145/304181.304224
[8]  
CALLAN J, 2000, ADV INFORM RETRIEVAL, P127
[9]   TREC AND TIPSTER EXPERIMENTS WITH INQUERY [J].
CALLAN, JP ;
CROFT, WB ;
BROGLIO, J .
INFORMATION PROCESSING & MANAGEMENT, 1995, 31 (03) :327-343
[10]  
CLARKE I, 2000, P ICSI WORKSH DES IS