LDC: Enabling search by partial distance in a hyper-dimensional space

被引:20
作者
Koudas, N [1 ]
Ooi, BC [1 ]
Shen, HT [1 ]
Tung, AKH [1 ]
机构
[1] AT&T Labs Res, Shannon Lab, Florham Pk, NJ 07932 USA
来源
20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS | 2004年
关键词
D O I
10.1109/ICDE.2004.1319980
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent advances in research fields like multimedia and bioinformatics have brought about a new generation of hyper-dimensional databases which can contain hundreds or even thousands of dimensions. Such hyper-dimensional databases pose significant problems to existing high-dimensional indexing techniques which have been developed for indexing databases with (commonly) less than a hundred dimensions. To support efficient querying and retrieval on hyper-dimensional databases, we propose a methodology called Local Digital Coding (LDC) which can support k-nearest neighbors (KNN) queries on hyper-dimensional databases and yet co-exist with ubiquitous indices, such as B+-trees. LDC extracts a simple bitmap representation. called Digital Code (DC) for each point in the database. Pruning during KNN search is performed by dynamically selecting only a subset of the bits from the DC based on which subsequent comparisons are performed. In doing so, expensive operations involved in computing L-norm distance functions between hyper-dimensional data can be avoided, Extensive experiments are conducted to show that our methodology offers significant performance advantages over other existing indexing methods on both real life and synthetic hyper-dimensional datasets.
引用
收藏
页码:6 / 17
页数:12
相关论文
共 13 条
[1]  
Berchtold S., 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073), P577, DOI 10.1109/ICDE.2000.839456
[2]  
BERCHTOLD S, 1998, SIGMOD, P142
[3]  
Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
[4]   Searching in high-dimensional spaces -: Index structures for improving the performance of multimedia Databases [J].
Böhm, C ;
Berchtold, S ;
Keim, D .
ACM COMPUTING SURVEYS, 2001, 33 (03) :322-373
[5]  
BORODIN A, 1999, P ACM STOC
[6]  
Chakrabarti K., 2000, VLDB C
[7]  
DEVRIES AP, 2002, SIGMOD C, P322
[8]  
Guha S., 1998, SIGMOD Record, V27, P73, DOI 10.1145/276305.276312
[9]  
JIN H, 2003, ICDE, P87
[10]  
SAKURAI Y, 2000, VLDB, P516