Pivot-based Metric Indexing

被引:41
作者
Chen, Lu [1 ,3 ]
Gao, Yunjun [1 ,2 ]
Zheng, Baihua [3 ]
Jensen, Christian S. [4 ]
Yang, Hanyu [1 ]
Yang, Keyu [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, Hangzhou, Zhejiang, Peoples R China
[2] Zhejiang Univ, Key Lab Big Data Intelligent Comp Zhejiang Prov, Hangzhou, Zhejiang, Peoples R China
[3] Singapore Management Univ, Sch Informat Syst, Singapore, Singapore
[4] Aalborg Univ, Dept Comp Sci, Aalborg, Denmark
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2017年 / 10卷 / 10期
关键词
SIMILARITY SEARCH; SPACES; TREE;
D O I
10.14778/3115404.3115411
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The general notion of a metric space encompasses a diverse range of data types and accompanying similarity measures. Hence, metric search plays an important role in a wide range of settings, including multimedia retrieval, data mining, and data integration. With the aim of accelerating metric search, a collection of pivot-based indexing techniques for metric data has been proposed, which reduces the number of potentially expensive similarity comparisons by exploiting the triangle inequality for pruning and validation. However, no comprehensive empirical study of those techniques exists. Existing studies each offers only a narrower coverage, and they use different pivot selection strategies that affect performance substantially and thus render cross-study comparisons difficult or impossible. We offer a survey of existing pivot-based indexing techniques, and report a comprehensive empirical comparison of their construction costs, update efficiency, storage sizes, and similarity search performance. As part of the study, we provide modifications for two existing indexing techniques to make them more competitive. The findings and insights obtained from the study reveal different strengths and weaknesses of different indexing techniques, and offer guidance on selecting an appropriate indexing technique for a given setting.
引用
收藏
页码:1058 / 1069
页数:12
相关论文
共 30 条
  • [1] Almeida J, 2010, CIKM, P1365
  • [2] Optimal Pivots to Minimize the Index Size for Metric Access Methods
    Ares, Luis G.
    Brisaboa, Nieves R.
    Esteller, Maria F.
    Pedreira, Oscar
    Places, Angeles S.
    [J]. SISAP 2009: 2009 SECOND INTERNATIONAL WORKSHOP ON SIMILARITY SEARCH AND APPLICATIONS, PROCEEDINGS, 2009, : 74 - 80
  • [3] CM-tree: A dynamic clustered index for similarity search in metric databases
    Aronovich, Lior
    Spiegler, Israel
    [J]. DATA & KNOWLEDGE ENGINEERING, 2007, 63 (03) : 919 - 946
  • [4] Baeza-Yates R., 1994, Combinatorial Pattern Matching. 5th Annual Symposium, CPM 94. Proceedings, P198
  • [5] Indexing large metric spaces for similarity search queries
    Bozkaya, T
    Ozsoyoglu, M
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 1999, 24 (03): : 361 - 404
  • [6] Bozkaya T., 1997, SIGMOD Record, V26, P357, DOI 10.1145/253262.253345
  • [7] Brin S., 1995, VLDB '95. Proceedings of the 21st International Conference on Very Large Data Bases, P574
  • [8] BURKHARD WA, 1973, COMMUN ACM, V16, P230, DOI 10.1145/362003.362025
  • [9] Pivot selection techniques for proximity searching in metric spaces
    Bustos, B
    Navarro, G
    Chávez, E
    [J]. PATTERN RECOGNITION LETTERS, 2003, 24 (14) : 2357 - 2366
  • [10] A compact space decomposition for effective metric indexing
    Chávez, E
    Navarro, G
    [J]. PATTERN RECOGNITION LETTERS, 2005, 26 (09) : 1363 - 1376