Metric Index: An Efficient and Scalable Solution for Similarity Search

被引:23
作者
Novak, David [1 ]
Batko, Michal [1 ]
机构
[1] Masaryk Univ, Brno, Czech Republic
来源
SISAP 2009: 2009 SECOND INTERNATIONAL WORKSHOP ON SIMILARITY SEARCH AND APPLICATIONS, PROCEEDINGS | 2009年
关键词
metric space; similarity search; data structure; approximation; scalability;
D O I
10.1109/SISAP.2009.26
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Metric space as a universal and versatile model of similarity can be applied in various areas of non-text information retrieval. However, a general, efficient and scalable solution for metric data management is still a resisting research challenge. We introduce a novel indexing and searching mechanism called Metric Index (M-Index), that employs practically all known principles of metric space partitioning, pruning and filtering. The heart of the M-Index is a general mapping mechanism that enables to actually store the data in well-established structures such as the B+-tree or even in a distributed storage. We have implemented the M-Index with B+-tree and performed experiments on a combination of five MPEG-7 descriptors in a database of hundreds of thousands digital images. The experiments put under test several M-Index variants and compare them with two orthogonal approaches - the PM-Tree and the iDistance. The trials show that the M-Index outperforms the others in terms of efficiency of search-space pruning, I/O costs, and response times for precise similarity queries. Furthermore, the M-Index demonstrates an excellent ability to keep similar data close in the index which makes its approximation algorithm very efficient - maintaining practically constant response times while preserving a very high recall as the dataset grows.
引用
收藏
页码:65 / 73
页数:9
相关论文
共 21 条
[1]  
[Anonymous], P 21 C VER LARG DAT
[2]  
[Anonymous], 2006, ADV DATABASE SYSTEMS
[3]  
Aspnes J, 2003, SIAM PROC S, P384
[4]  
Batko M, 2007, LECT NOTES COMPUT SC, V4877, P1
[5]  
Bolettieri P., 2009, CORR
[6]   Pivot selection techniques for proximity searching in metric spaces [J].
Bustos, B ;
Navarro, G ;
Chávez, E .
PATTERN RECOGNITION LETTERS, 2003, 24 (14) :2357-2366
[7]  
CHVEZ E, 2000, TRDCC001 U CHIL DEP
[8]  
Ciaccia P, 1997, PROCEEDINGS OF THE TWENTY-THIRD INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, P426
[9]   D-Index: Distance searching index for metric data sets [J].
Dohnal, V ;
Gennaro, C ;
Savino, P ;
Zezula, P .
MULTIMEDIA TOOLS AND APPLICATIONS, 2003, 21 (01) :9-33
[10]  
Fagin R., 1979, ACM Transactions on Database Systems, V4, P315, DOI 10.1145/320083.320092