K-Nearest Neighbor Search by Random Projection Forests

被引:18
|
作者
Yan, Donghui [1 ,2 ]
Wang, Yingjie [3 ]
Wang, Jin [3 ]
Wang, Honggang [3 ]
Li, Zhenpeng [4 ]
机构
[1] Univ Massachusetts, Dept Math, Dartmouth, MA 02747 USA
[2] Univ Massachusetts, Program Data Sci, Dartmouth, MA 02747 USA
[3] Univ Massachusetts, Dept Elect & Comp Engn, Dartmouth, MA 02747 USA
[4] Dali Univ, Dept Math & Comp Sci, Dali 671003, Yunnan, Peoples R China
关键词
Big Data; Vegetation; Computational complexity; Forestry; Data mining; Computers; Search problems; K-nearest neighbors; random projection forests; ensemble; unsupervised learning; ALGORITHMS;
D O I
10.1109/TBDATA.2019.2908178
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
K-nearest neighbor (kNN) search is an important problem in data mining and knowledge discovery. Inspired by the huge success of tree-based methodology and ensemble methods over the last decades, we propose a new method for kNN search, random projection forests (rpForests). rpForests finds nearest neighbors by combining multiple kNN-sensitive trees with each constructed recursively through a series of random projections. As demonstrated by experiments on a wide collection of real datasets, our method achieves a remarkable accuracy in terms of fast decaying missing rate of kNNs and that of discrepancy in the k-th nearest neighbor distances. rpForests has a very low computational complexity as a tree-based methodology. The ensemble nature of rpForests makes it easily parallelized to run on clustered or multicore computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights on rpForests by showing the exponential decay of neighboring points being separated by ensemble random projection trees when the ensemble size increases. Our theory can also be used to refine the choice of random projections in the growth of rpForests; experiments show that the effect is remarkable.
引用
收藏
页码:147 / 157
页数:11
相关论文
共 50 条
  • [21] A Road Network Embedding Technique for K-Nearest Neighbor Search in Moving Object Databases
    Cyrus Shahabi
    Mohammad R. Kolahdouzan
    Mehdi Sharifzadeh
    GeoInformatica, 2003, 7 : 255 - 273
  • [22] Neighbor-weighted K-nearest neighbor for unbalanced text corpus
    Tan, SB
    EXPERT SYSTEMS WITH APPLICATIONS, 2005, 28 (04) : 667 - 671
  • [23] Fast k-Nearest Neighbor Searching in Static Objects
    Jae Moon Lee
    Wireless Personal Communications, 2017, 93 : 147 - 160
  • [24] The k-Nearest Neighbor Algorithm Using MapReduce Paradigm
    Anchalia, Prajesh P.
    Roy, Kaushik
    PROCEEDINGS FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, MODELLING AND SIMULATION, 2014, : 513 - 518
  • [25] Distributed and Joint Evidential K-Nearest Neighbor Classification
    Gong, Chaoyu
    Demmel, Jim
    You, Yang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (11) : 5972 - 5985
  • [26] Dynamic data structures for k-nearest neighbor queries
    de Berg, Sarita
    Staals, Frank
    COMPUTATIONAL GEOMETRY-THEORY AND APPLICATIONS, 2023, 111
  • [27] K-nearest neighbor classification based on influence function
    College of Information Engineering, Zhengzhou University, Zhengzhou
    450052, China
    Dianzi Yu Xinxi Xuebao, 7 (1626-1632): : 1626 - 1632
  • [28] Fast k-Nearest Neighbor Searching in Static Objects
    Lee, Jae Moon
    WIRELESS PERSONAL COMMUNICATIONS, 2017, 93 (01) : 147 - 160
  • [29] K-Nearest Neighbor Classification for Glass Identification Problem
    Aldayel, Mashael S.
    2012 INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND INDUSTRIAL INFORMATICS (ICCSII), 2012,
  • [30] Parallel K-Nearest Neighbor Implementation on Multicore Processors
    Halkarnikar, P. P.
    Chougale, Ananda P.
    Khandagale, H. P.
    Kulkarni, P. P.
    2012 INTERNATIONAL CONFERENCE ON RADAR, COMMUNICATION AND COMPUTING (ICRCC), 2012, : 221 - 223