Quicker range- and k-NN joins in metric spaces

被引:4
|
作者
Fredriksson, Kimmo [1 ]
Braithwaite, Billy [1 ]
机构
[1] Univ Eastern Finland, Sch Comp, Kuopio 70211, Finland
关键词
Metric spaces; Similarity joins; k-nearest neighbors; Range searching; QUERIES; SEARCH; INDEX;
D O I
10.1016/j.is.2014.09.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the join operation in metric spaces. Given two sets A and B of objects drawn from some universe U, we want to compute the set A x B = {(a, b) is an element of A (sic) B vertical bar d(a, b) <= r) efficiently, where d: U x U -> R+ is a metric distance function and r is an element of R+ is user supplied query radius. In particular we are interested in the case where we have no index available (nor we can afford to build it) for either A or B. In this paper we improve the Quickjoin algorithm [6], inspired by the well-known Quicksort algorithm, by (i) replacing the low level component that handles small subsets with essentially brute-force nested loop with a more efficient method; (ii) showing that, contrary to Quicksort, in Quickjoin unbalanced partitioning can improve the algorithm; (iii) making the algorithm probabilistic while still obtaining most of the relevant results; and (iv) improving the (worst case) work space needed, from O(n(2)) to O(logn), where n is the size of the dataset(s). We also discuss pivot selection, show how to adapt Quickjoin to compute k-nearest neighbor joins and sketch a parallel version of the algorithm. The experimental results show that the method works well in practice. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:189 / 204
页数:16
相关论文
共 50 条
  • [41] DBSCAN Parameter Selection Based on K-NN
    Delgado, Leonardo
    Morales, Eduardo F.
    ADVANCES IN COMPUTATIONAL INTELLIGENCE (MICAI 2021), PT I, 2021, 13067 : 187 - 198
  • [42] Leveraging k-NN for generic classification boosting
    Piro, Paolo
    Nock, Richard
    Nielsen, Frank
    Barlaud, Michel
    NEUROCOMPUTING, 2012, 80 : 3 - 9
  • [43] Voting over multiple k-NN classifiers
    Grabowski, S
    MODERN PROBLEMS OF RADIO ENGINEERING, TELECOMMUNICATIONS AND COMPUTER SCIENCE, PROCEEDINGS, 2002, : 223 - 225
  • [44] Music Tempo Estimation With k-NN Regression
    Eronen, Antti J.
    Klapuri, Anssi P.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (01): : 50 - 57
  • [45] Optimization of K-NN by feature weight learning
    Shi, Q
    Lv, L
    Chen, H
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 2828 - 2831
  • [46] Distance Weighted Fuzzy k-NN SVM
    Cheng, Yi-Wen
    Wen, Te-Jen
    Cheng, Hui-Chuan
    Yang, Chan-Yun
    2016 IEEE 13TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING, AND CONTROL (ICNSC), 2016,
  • [47] A learning scheme for a fuzzy k-NN rule
    Jozwik, Adam
    PATTERN RECOGNITION LETTERS, 1983, 1 (5-6) : 287 - 289
  • [48] Scalable k-NN based text clustering
    Lulli, Alessandro
    Debatty, Thibault
    Dell'Amico, Matteo
    Michiardi, Pietro
    Ricci, Laura
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 958 - 963
  • [49] Connectionist learning of weights for k-NN retrieval
    Dhar, AR
    Khemani, D
    PROCEEDINGS OF THE 7TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2003, : 1697 - 1700
  • [50] k-NN aggregation with a stacked email representation
    Orecchioni, Amandine
    Wiratunga, Nirmalie
    Massie, Stewart
    Craw, Susan
    ADVANCES IN CASE-BASED REASONING, PROCEEDINGS, 2008, 5239 : 415 - 429