Distributed Processing of Similarity Queries

被引:0
作者
Apostolos N. Papadopoulos
Yannis Manolopoulos
机构
[1] Aristotle University,Data Engineering Research Lab., Department of Informatics
[2] Aristotle University,Data Engineering Research Lab., Department of Informatics
来源
Distributed and Parallel Databases | 2001年 / 9卷
关键词
distributed databases; multidimensional data; similarity queries; query processing;
D O I
暂无
中图分类号
学科分类号
摘要
Many modern applications in diverse fields demand the efficient manipulation of very large multidimensional datasets. It is evident, that efficient and effective query processing techniques need to be developed, in order to provide acceptable response times in query processing. In this paper, we study the processing of similarity nearest neighbor queries in large distributed multidimensional databases, where objects are represented as vectors in a vector space, and are distributed in a multi-computer environment. The departure from the centralized case embodies a number of advantages and (unfortunately) a number of difficulties that need to be successfully overcome. In this perspective, four query evaluation strategies are presented, namely Concurrent Processing (CP), Selective Processing (SP), Two-Phase Processing (2PP) and Probabilistic Processing (PRP). The proposed techniques are compared analytically and experimentally, in order to discover the advantages of each one, as well as the best cases where each one should be applied. Experimental results are presented, demonstrating the performance of each method under different parameters values. Also, we investigate the impact of derived data that should be maintained in order to process similarity queries efficiently.
引用
收藏
页码:67 / 92
页数:25
相关论文
共 14 条
[1]  
Bentley J.L.(1980)Optimal expected-time algorithms for closest point problems ACM Transactions on Mathematical Software 6 563-580
[2]  
Weide B.W.(1992)Parallel database systems: The future of high performance database systems Communications of the ACM 6 85-98
[3]  
Yao A.C.(1977)An algorithm for finding the best matches in logarithmic expected time ACM Transactions on Mathematical Software 3 209-226
[4]  
DeWitt D.(1994)An introduction to spatial database systems The VLDB Journal 3 357-399
[5]  
Valduriez P.(1994)The TV-tree: An index structure for high-dimensional data The VLDB Journal 3 517-542
[6]  
Friedman J.H.(1983)A distance measure between attributed relational graphs for pattern recognition IEEE Transactions on Systems, Man and Cybernetics smc-13 353-362
[7]  
Bentley J.L.(undefined)undefined undefined undefined undefined-undefined
[8]  
Finkel R.A.(undefined)undefined undefined undefined undefined-undefined
[9]  
Guting R.H.(undefined)undefined undefined undefined undefined-undefined
[10]  
Lin K.(undefined)undefined undefined undefined undefined-undefined