A compact space decomposition for effective metric indexing

被引:103
作者
Chávez, E
Navarro, G
机构
[1] Univ Michoacana, Escuela Ciencias Fisico Matemat, Morella 58000, Michoacan, Mexico
[2] Univ Chile, Dipartimento Ciencias Computac, Ctr Invest Web, Santiago, Chile
关键词
D O I
10.1016/j.patrec.2004.11.014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The metric space model abstracts many proximity search problems, from nearest-neighbor classifiers to textual and multimedia information retrieval. In this context, an index is a data structure that speeds up proximity queries. However, indexes lose their efficiency as the intrinsic data dimensionality increases. In this paper we present a simple index called list of clusters (LC), which is based on a compact partitioning of the data set. The LC is shown to require little space, to be suitable both for main and secondary memory implementations, and most importantly, to be very resistant to the intrinsic dimensionality of the data set. In this aspect our structure is unbeaten. We finish with a discussion of the role of unbalancing in metric space searching, and how it permits trading memory space for construction time. (c) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:1363 / 1376
页数:14
相关论文
共 26 条
[1]  
[Anonymous], P 21 INT C VER LARG
[2]  
[Anonymous], P SPIRE CRIWG
[3]  
Aurenhammer F, 1991, ACM COMPUT SURV, P23
[4]  
Baeza-Yates R., 1994, LNCS, V807/1994, P198
[5]   Searching in high-dimensional spaces -: Index structures for improving the performance of multimedia Databases [J].
Böhm, C ;
Berchtold, S ;
Keim, D .
ACM COMPUTING SURVEYS, 2001, 33 (03) :322-373
[6]  
Bozkaya T., 1997, SIGMOD Record, V26, P357, DOI 10.1145/253262.253345
[7]  
BURKHARD WA, 1973, COMMUN ACM, V16, P230, DOI 10.1145/362003.362025
[8]   Searching in metric spaces [J].
Chávez, E ;
Navarro, G ;
BaezaYates, R ;
Marroquín, JL .
ACM COMPUTING SURVEYS, 2001, 33 (03) :273-321
[9]   Fixed queries array:: A fast and economical data structure for proximity searching [J].
Chávez, E ;
Marroquín, JL ;
Navarro, G .
MULTIMEDIA TOOLS AND APPLICATIONS, 2001, 14 (02) :113-135
[10]   An effective clustering algorithm to index high dimensional metric spaces [J].
Chávez, E ;
Navarro, G .
SPIRE 2000: SEVENTH INTERNATIONAL SYMPOSIUM ON STRING PROCESSING AND INFORMATION RETRIEVAL - PROCEEDINGS, 2000, :75-86