Group Enclosing Queries

被引:31
作者
Li, Feifei [1 ]
Yao, Bin [1 ]
Kumar, Piyush [1 ]
机构
[1] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
基金
美国国家科学基金会;
关键词
Aggregate nearest neighbor; approximate nearest neighbor; minmax nearest neighbor; nearest neighbor; SPATIAL DATABASES; TREE; DATASETS;
D O I
10.1109/TKDE.2010.181
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given a set of points P and a query set Q, a group enclosing query (GEQ) fetches the point p* is an element of P such that the maximum distance of p* to all points in Q is minimized. This problem is equivalent to the Min-Max case (minimizing the maximum distance) of aggregate nearest neighbor queries for spatial databases [27]. This work first designs a new exact solution by exploring new geometric insights, such as the minimum enclosing ball, the convex hull, and the furthest voronoi diagram of the query group. To further reduce the query cost, especially when the dimensionality increases, we turn to approximation algorithms. Our main approximation algorithm has a worst case root 2-approximation ratio if one can find the exact nearest neighbor of a point. In practice, its approximation ratio never exceeds 1.05 for a large number of data sets up to six dimensions. We also discuss how to extend it to higher dimensions (up to 74 in our experiment) and show that it still maintains a very good approximation quality (still close to 1) and low query cost. In fixed dimensions, we extend the root 2-approximation algorithm to get a (1 + epsilon)-approximate solution for the GEQ problem. Both approximation algorithms have O(log N + M) query cost in any fixed dimension, where N and M are the sizes of the data set P and query group Q. Extensive experiments on both synthetic and real data sets, up to 10 million points and 74 dimensions, confirm the efficiency, effectiveness, and scalability of the proposed algorithms, especially their significant improvement over the state-of-the-art method.
引用
收藏
页码:1526 / 1540
页数:15
相关论文
共 34 条
[1]  
[Anonymous], P ACM SIGMOD INT C M
[2]  
[Anonymous], P ACM SIGMOD INT C M
[3]  
[Anonymous], 1999, P INT C VER LARG DAT
[4]   Approximate range searching [J].
Arya, S ;
Mount, DM .
COMPUTATIONAL GEOMETRY-THEORY AND APPLICATIONS, 2000, 17 (3-4) :135-152
[5]   An optimal algorithm for approximate nearest neighbor searching in fixed dimensions [J].
Arya, S ;
Mount, DM ;
Netanyahu, NS ;
Silverman, R ;
Wu, AY .
JOURNAL OF THE ACM, 1998, 45 (06) :891-923
[6]  
AURENHAMMER F, 1991, COMPUT SURV, V23, P345, DOI 10.1145/116873.116880
[7]  
BADOIU M, 2002, P ANN ACM S THEOR CO
[8]  
Beckmann Norbert., 1990, P ACM SIGMOD INT C M
[9]  
Berg M.d., 1997, COMPUTATIONAL GEOMET
[10]   A cost model for query processing in high dimensional data spaces [J].
Böhm, C .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2000, 25 (02) :129-178