K-Nearest Neighbor Search by Random Projection Forests

被引:18
|
作者
Yan, Donghui [1 ,2 ]
Wang, Yingjie [3 ]
Wang, Jin [3 ]
Wang, Honggang [3 ]
Li, Zhenpeng [4 ]
机构
[1] Univ Massachusetts, Dept Math, Dartmouth, MA 02747 USA
[2] Univ Massachusetts, Program Data Sci, Dartmouth, MA 02747 USA
[3] Univ Massachusetts, Dept Elect & Comp Engn, Dartmouth, MA 02747 USA
[4] Dali Univ, Dept Math & Comp Sci, Dali 671003, Yunnan, Peoples R China
关键词
Big Data; Vegetation; Computational complexity; Forestry; Data mining; Computers; Search problems; K-nearest neighbors; random projection forests; ensemble; unsupervised learning; ALGORITHMS;
D O I
10.1109/TBDATA.2019.2908178
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
K-nearest neighbor (kNN) search is an important problem in data mining and knowledge discovery. Inspired by the huge success of tree-based methodology and ensemble methods over the last decades, we propose a new method for kNN search, random projection forests (rpForests). rpForests finds nearest neighbors by combining multiple kNN-sensitive trees with each constructed recursively through a series of random projections. As demonstrated by experiments on a wide collection of real datasets, our method achieves a remarkable accuracy in terms of fast decaying missing rate of kNNs and that of discrepancy in the k-th nearest neighbor distances. rpForests has a very low computational complexity as a tree-based methodology. The ensemble nature of rpForests makes it easily parallelized to run on clustered or multicore computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights on rpForests by showing the exponential decay of neighboring points being separated by ensemble random projection trees when the ensemble size increases. Our theory can also be used to refine the choice of random projections in the growth of rpForests; experiments show that the effect is remarkable.
引用
收藏
页码:147 / 157
页数:11
相关论文
共 50 条
  • [31] Efficient Algorithms and Cost Models for Reverse Spatial-Keyword k-Nearest Neighbor Search
    Lu, Ying
    Lu, Jiaheng
    Cong, Gao
    Wu, Wei
    Shahabi, Cyrus
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2014, 39 (02):
  • [32] Random kernel k-nearest neighbors regression
    Srisuradetchai, Patchanok
    Suksrikran, Korn
    FRONTIERS IN BIG DATA, 2024, 7
  • [33] Comparison of Accuracy Estimation for Weighted k-Nearest Neighbor Classifiers
    Zhao, Ming
    Chen, Jingchao
    Xu, Mengyao
    FUZZY SYSTEMS AND DATA MINING V (FSDM 2019), 2019, 320 : 783 - 791
  • [34] Twin neural network improved k-nearest neighbor regression
    Wetzel, Sebastian J.
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
  • [35] Attention-based Local Mean K-Nearest Centroid Neighbor Classifier
    Ma, Ying
    Huang, Rui
    Yan, Ming
    Li, Guoqi
    Wang, Tian
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 201
  • [36] wSparse Coefficient-Based k-Nearest Neighbor Classification
    Ma, Hongxing
    Gou, Jianping
    Wang, Xili
    Ke, Jia
    Zeng, Shaoning
    IEEE ACCESS, 2017, 5 : 16618 - 16634
  • [37] Trending Topic Prediction by Optimizing K-Nearest Neighbor Algorithm
    Syarif, Syafruddin
    Anwar
    Dewiani
    2017 4TH INTERNATIONAL CONFERENCE ON COMPUTER APPLICATIONS AND INFORMATION PROCESSING TECHNOLOGY (CAIPT), 2017, : 125 - 128
  • [38] Reconfigurable hardware implementation of K-nearest neighbor algorithm on FPGA
    Yacoub, Mohammed H.
    Ismail, Samar M.
    Said, Lobna A.
    Madian, Ahmed H.
    Radwan, Ahmed G.
    AEU-INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATIONS, 2024, 173
  • [39] Scalable Evidential K-Nearest Neighbor Classification on Big Data
    Gong, Chaoyu
    Demmel, Jim
    You, Yang
    IEEE TRANSACTIONS ON BIG DATA, 2024, 10 (03) : 226 - 237
  • [40] Evaluation of k-Nearest Neighbor classifier performance for direct marketing
    Govindarajan, M.
    Chandrasekaran, R. M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (01) : 253 - 258