Retrieving information from a distributed heterogeneous document collection

被引:2
作者
Baumgarten, C [1 ]
机构
[1] Eurospider Informat Technol AG, Zurich, Switzerland
来源
INFORMATION RETRIEVAL | 2000年 / 3卷 / 03期
关键词
distributed information retrieval; heterogeneity; probability ranking principle;
D O I
10.1023/A:1026572910743
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes a probabilistic model for optimum information retrieval in a distributed heterogeneous environment. The model assumes the collection of documents offered by the environment to be partitioned into subcollections. Documents as well as subcollections have to be indexed, where indexing methods using different indexing vocabularies can be employed. A query provided by a user is answered in terms of a ranked list of documents. The model determines a procedure For ranking the documents that stems from the Probability Ranking Principle: For each subcollection. the subcollection's documents are ranked. the resulting ranked lists are combined into a final ranked list or documents, where the ordering is determined by the documents' probabilities of being relevant with respect to the user's query. Various probabilistic ranking methods may be involved in the distributed ranking process, A criterion for effectively limiting the ranking process to a subset of subcollections extends the model. The property that different ranking methods and indexing vocabularies can be used is important when the subcollections are heterogeneous with respect to their content. The model's applicability is experimentally confirmed. When exploiting the degrees of freedom provided by the model, experiments showed evidence that the model even outperforms comparable models for the non-distributed case with respect to retrieval effectiveness.
引用
收藏
页码:253 / 271
页数:19
相关论文
共 18 条
[1]  
[Anonymous], J DOCUMENTATION
[2]  
BAUMGARTEN C, 1999, P 22 ANN INT ACM SIG
[3]  
BAUMGARTEN C, 1997, P 20 ACM SIGIR C RES
[4]  
BAUMGARTEN C, 1999, THESIS DRESDEN U TEC
[5]  
CALLAN JP, 1995, P 18 ACM SIGIR C RES
[6]  
FRENCH J, 1998, P 21 ACM SIG C RES D
[7]  
FUHR N, 1993, INFORMATION RETRIEVA
[8]  
FUHR N, 1999, ACM T INFORMATION SY, V17
[9]  
FUHR N, 1992, P 15 ACM SIG C RES D
[10]  
GRAVANO L, 1995, P 21 VLDB C