K-Nearest Neighbor Search by Random Projection Forests

被引:18
|
作者
Yan, Donghui [1 ,2 ]
Wang, Yingjie [3 ]
Wang, Jin [3 ]
Wang, Honggang [3 ]
Li, Zhenpeng [4 ]
机构
[1] Univ Massachusetts, Dept Math, Dartmouth, MA 02747 USA
[2] Univ Massachusetts, Program Data Sci, Dartmouth, MA 02747 USA
[3] Univ Massachusetts, Dept Elect & Comp Engn, Dartmouth, MA 02747 USA
[4] Dali Univ, Dept Math & Comp Sci, Dali 671003, Yunnan, Peoples R China
关键词
Big Data; Vegetation; Computational complexity; Forestry; Data mining; Computers; Search problems; K-nearest neighbors; random projection forests; ensemble; unsupervised learning; ALGORITHMS;
D O I
10.1109/TBDATA.2019.2908178
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
K-nearest neighbor (kNN) search is an important problem in data mining and knowledge discovery. Inspired by the huge success of tree-based methodology and ensemble methods over the last decades, we propose a new method for kNN search, random projection forests (rpForests). rpForests finds nearest neighbors by combining multiple kNN-sensitive trees with each constructed recursively through a series of random projections. As demonstrated by experiments on a wide collection of real datasets, our method achieves a remarkable accuracy in terms of fast decaying missing rate of kNNs and that of discrepancy in the k-th nearest neighbor distances. rpForests has a very low computational complexity as a tree-based methodology. The ensemble nature of rpForests makes it easily parallelized to run on clustered or multicore computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights on rpForests by showing the exponential decay of neighboring points being separated by ensemble random projection trees when the ensemble size increases. Our theory can also be used to refine the choice of random projections in the growth of rpForests; experiments show that the effect is remarkable.
引用
收藏
页码:147 / 157
页数:11
相关论文
共 50 条
  • [41] Multi-GPU Implementation of k-Nearest Neighbor Algorithm
    Masek, Jan
    Burget, Kadim
    Karasek, Jan
    Uher, Vaclav
    Dutta, Malay Kishore
    2015 38TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2015, : 764 - 767
  • [42] Parallel Computation of k-Nearest Neighbor Joins Using MapReduce
    Kim, Wooyeol
    Kim, Younghoon
    Shim, Kyuseok
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 696 - 705
  • [43] Multi-GPU algorithm for k-nearest neighbor problem
    Kato, Kimikazu
    Hosino, Tikara
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2012, 24 (01) : 45 - 53
  • [44] Effective k-nearest neighbor models for data classification enhancement
    Ali A. Amer
    Sri Devi Ravana
    Riyaz Ahamed Ariyaluran Habeeb
    Journal of Big Data, 12 (1)
  • [45] A new globally adaptive k-nearest neighbor classifier based on local mean optimization
    Pan, Zhibin
    Pan, Yiwei
    Wang, Yidi
    Wang, Wei
    SOFT COMPUTING, 2021, 25 (03) : 2417 - 2431
  • [46] RACEkNN: A hybrid approach for improving the effectiveness of the k-nearest neighbor algorithm
    Ebrahimi, Mahdiyeh
    Basiri, Alireza
    KNOWLEDGE-BASED SYSTEMS, 2024, 301
  • [47] Sublinear time approximation of the cost of a metric k-nearest neighbor graph
    Czumaj, Artur
    Sohler, Christian
    PROCEEDINGS OF THE THIRTY-FIRST ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA'20), 2020, : 2973 - 2992
  • [48] Fall Detection by Using K-Nearest Neighbor Algorithm on WSN Data
    Erdogan, Senol Zafer
    Bilgin, Turgay Tugay
    Cho, Juphil
    2010 IEEE GLOBECOM WORKSHOPS, 2010, : 2054 - 2058
  • [49] Efficient k-nearest neighbors search in graph space
    Abu-Aisheh, Zeina
    Raveaux, Romain
    Ramel, Jean-Yves
    PATTERN RECOGNITION LETTERS, 2020, 134 (134) : 77 - 86
  • [50] A Hierarchical K-Nearest Neighbor Approach for Volume of Tissue Activated Estimation
    De la Pava, I.
    Mejia, J.
    Alvarez-Meza, A.
    Alvarez, M.
    Orozco, A.
    Henao, O.
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2016, 2017, 10125 : 125 - 133