High dimensional similarity search with space filling curves

被引:32
作者
Liao, SW [1 ]
Lopez, MA [1 ]
Leutenegger, ST [1 ]
机构
[1] Univ Denver, Dept Math & Comp Sci, Denver, CO 80208 USA
来源
17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS | 2001年
关键词
D O I
10.1109/ICDE.2001.914876
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a new approach for approximate nearest neighbor queries for sets of high dimensional points under any L-t-metric, t = 1,..., infinity. The proposed algorithm is efficient and simple to implement. The algorithm uses multiple shifted copies of the data points and stores them in up to (d + 1) B-trees where d is the dimensionality of the data, sorted according to their position along a space filling curve. This is done in a way that allows us to guarantee that a neighbor within an O(d(1+1/t)) factor of the exact nearest, can be returned with at most (d + 1)log(p) n page accesses, where p is the branching factor of the B-trees. In practice, for real data sets, our approximate technique finds the exact nearest neighbor between 87% and 99% of the time and a point no farther than the third nearest neighbor between 98% and 100% of the time. Our solution is dynamic, allowing insertion or deletion of points in O(d log(p) n) page accesses and generalizes easily to find approximate k-nearest neighbors.
引用
收藏
页码:615 / 622
页数:8
相关论文
共 14 条
[1]   Space-filling curves and their use in the design of geometric data structures [J].
Asano, T ;
Ranjan, D ;
Roos, T ;
Welzl, E ;
Widmayer, P .
THEORETICAL COMPUTER SCIENCE, 1997, 181 (01) :3-15
[2]  
BERCHTOLD S, 1998, P ACM SIGMOD, P501
[3]  
Chan T. M., 1997, Proceedings of the Thirteenth Annual Symposium on Computational Geometry, P352, DOI 10.1145/262839.263001
[4]  
CRAVER S, 1999, SPIE, V3656, P155
[5]   Multidimensional access methods [J].
Gaede, V ;
Gunther, O .
ACM COMPUTING SURVEYS, 1998, 30 (02) :170-231
[6]  
Gionis A, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P518
[7]  
Katayama N., 1997, P ACM SIGMOD, P369
[8]   Texture features for browsing and retrieval of image data [J].
Manjunath, BS ;
Ma, WY .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1996, 18 (08) :837-842
[9]   A simple algorithm for nearest neighbor search in high dimensions [J].
Nene, SA ;
Nayar, SK .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (09) :989-1003
[10]  
ROUSSOPOULOS N, 1995, P ACM SIGMOD INT C M, P71, DOI DOI 10.1145/223784.223794