Metric space similarity joins

被引:79
|
作者
Jacox, Edwin H. [1 ]
Samet, Hanan
机构
[1] Univ Maryland, Dept Comp Sci, Ctr Automat Res, College Pk, MD 20742 USA
[2] Univ Maryland, Inst Adv Comp Studies, College Pk, MD 20742 USA
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2008年 / 33卷 / 02期
关键词
algorithms; performance; similarity join; external memory algorithms; distance-based indexing; nearest neighbor queries; range queries; ranking;
D O I
10.1145/1366102.1366104
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Similarity join algorithms find pairs of objects that lie within a certain distance epsilon of each other. Algorithms that are adapted from spatial join techniques are designed primarily for data in a vector space and often employ some form of a multidimensional index. For these algorithms, when the data lies in a metric space, the usual solution is to embed the data in vector space and then make use of a multidimensional index. Such an approach has a number of drawbacks when the data is high dimensional as we must eventually find the most discriminating dimensions, which is not trivial. In addition, although the maximum distance between objects increases with dimension, the ability to discriminate between objects in each dimension does not. These drawbacks are overcome via the introduction of a new method called Quickjoin that does not require a multidimensional index and instead adapts techniques used in distance-based indexing for use in a method that is conceptually similar to the Quicksort algorithm. A formal analysis is provided of the Quickjoin method. Experiments show that the Quickjoin method significantly outperforms two existing techniques.
引用
收藏
页数:38
相关论文
共 50 条
  • [1] Quicker Similarity Joins in Metric Spaces
    Fredriksson, Kimmo
    Braithwaite, Billy
    SIMILARITY SEARCH AND APPLICATIONS (SISAP), 2013, 8199 : 127 - 140
  • [2] Metric Similarity Joins Using MapReduce
    Chen, Gang
    Yang, Keyu
    Chen, Lu
    Gao, Yunjun
    Zheng, Baihua
    Chen, Chun
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (03) : 656 - 669
  • [3] Efficient Metric Indexing for Similarity Search and Similarity Joins
    Chen, Lu
    Gao, Yunjun
    Li, Xinhan
    Jensen, Christian S.
    Chen, Gang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (03) : 556 - 571
  • [4] Exploiting Database Similarity Joins for Metric Spaces
    Silva, Yasin N.
    Pearson, Spencer
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12): : 1922 - 1925
  • [5] Accelerating Metric Space Similarity Joins with Multi-core and Many-core Processors
    Jin, Shichao
    Kim, Okhee
    Feng, Wenya
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2013, PT V, 2013, 7975 : 166 - 180
  • [6] Accelerating metric space similarity joins with multi-core and many-core processors
    Jin, Shichao
    Kim, Okhee
    Feng, Wenya
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2013, 7971 : 166 - 180
  • [7] Metric Similarity Joins Using MapReduce (Extended abstract)
    Chen, Gang
    Yang, Keyu
    Chen, Lu
    Gao, Yunjun
    Zheng, Baihua
    Chen, Chun
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1787 - 1788
  • [8] An efficient algorithm for approximated self-similarity joins in metric spaces
    Ferrada, Sebastian
    Bustos, Benjamin
    Reyes, Nora
    INFORMATION SYSTEMS, 2020, 91
  • [9] List of twin clusters: a data structure for similarity joins in metric spaces
    Paredes, Rodrigo
    Reyes, Nora
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1 AND 2, 2008, : 578 - +
  • [10] List of twin clusters: a data structure for similarity joins in metric spaces
    Paredes, Rodrigo
    Reyes, Nora
    SISAP 2008: FIRST INTERNATIONAL WORKSHOP ON SIMILARITY SEARCH AND APPLICATIONS, PROCEEDINGS, 2008, : 131 - 138