Quicker range- and k-NN joins in metric spaces

被引:4
|
作者
Fredriksson, Kimmo [1 ]
Braithwaite, Billy [1 ]
机构
[1] Univ Eastern Finland, Sch Comp, Kuopio 70211, Finland
关键词
Metric spaces; Similarity joins; k-nearest neighbors; Range searching; QUERIES; SEARCH; INDEX;
D O I
10.1016/j.is.2014.09.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the join operation in metric spaces. Given two sets A and B of objects drawn from some universe U, we want to compute the set A x B = {(a, b) is an element of A (sic) B vertical bar d(a, b) <= r) efficiently, where d: U x U -> R+ is a metric distance function and r is an element of R+ is user supplied query radius. In particular we are interested in the case where we have no index available (nor we can afford to build it) for either A or B. In this paper we improve the Quickjoin algorithm [6], inspired by the well-known Quicksort algorithm, by (i) replacing the low level component that handles small subsets with essentially brute-force nested loop with a more efficient method; (ii) showing that, contrary to Quicksort, in Quickjoin unbalanced partitioning can improve the algorithm; (iii) making the algorithm probabilistic while still obtaining most of the relevant results; and (iv) improving the (worst case) work space needed, from O(n(2)) to O(logn), where n is the size of the dataset(s). We also discuss pivot selection, show how to adapt Quickjoin to compute k-nearest neighbor joins and sketch a parallel version of the algorithm. The experimental results show that the method works well in practice. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:189 / 204
页数:16
相关论文
共 50 条
  • [31] Solving similarity joins and range queries in metric spaces with the list of twin clusters
    Paredes, Rodrigo
    Reyes, Nora
    JOURNAL OF DISCRETE ALGORITHMS, 2009, 7 (01) : 18 - 35
  • [32] A Note on the k-NN Density Estimate
    Ding, Jie
    Zhu, Xinshan
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2016, 2016, 9937 : 79 - 88
  • [33] Exploring the Feature Selection of the EEG Signal Time and Frequency Domain Features for k-NN and Weighted k-NN
    Diah, Theresia K.
    Faqih, Akhmad
    Kusumoputro, Benyamin
    PROCEEDINGS OF 2019 IEEE R10 HUMANITARIAN TECHNOLOGY CONFERENCE (IEEE R10 HTC 2019), 2019, : 196 - 199
  • [34] MNIST Dataset Classification Utilizing k-NN Classifier with Modified Sliding-Window Metric
    Grover, Divas
    Toghi, Behrad
    ADVANCES IN COMPUTER VISION, VOL 2, 2020, 944 : 583 - 591
  • [35] Achieving the time of 1-NN, but the accuracy of k-NN
    Xue, Lirong
    Kpotufe, Samory
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [36] K-NN: ESTIMATING AN ADEQUATE VALUE FOR PARAMETER K
    Borsato, Bruno
    Plastino, Alexandre
    Merschmann, Luiz
    ICEIS 2008: PROCEEDINGS OF THE TENTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL AIDSS: ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS, 2008, : 459 - +
  • [37] New query processing algorithms for range and k-NN search in spatial network databases
    Chang, Jae-Woo
    Kim, Yong-Ki
    Kim, Sang-Mi
    Kim, Young-Chang
    ADVANCES IN CONCEPTUAL MODELING - THEORY AND PRACTICE, PROCEEDINGS, 2006, 4231 : 130 - +
  • [38] An automatic selection method of k in k-NN classifier
    Du, L. (dulei.323@stu.xjtu.edu.cn), 2013, Northeast University (28):
  • [39] k-NN classifiers: investigating the k = k (n) relationship
    Alippi, C.
    Fuhrman, M.
    Roveri, M.
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 3676 - +
  • [40] Fuzzy k-NN applied to moulds detection
    Kuske, M
    Rubio, R
    Romain, AC
    Nicolas, J
    Marco, S
    SENSORS AND ACTUATORS B-CHEMICAL, 2005, 106 (01) : 52 - 60