K-Nearest Neighbor Search by Random Projection Forests

被引:18
|
作者
Yan, Donghui [1 ,2 ]
Wang, Yingjie [3 ]
Wang, Jin [3 ]
Wang, Honggang [3 ]
Li, Zhenpeng [4 ]
机构
[1] Univ Massachusetts, Dept Math, Dartmouth, MA 02747 USA
[2] Univ Massachusetts, Program Data Sci, Dartmouth, MA 02747 USA
[3] Univ Massachusetts, Dept Elect & Comp Engn, Dartmouth, MA 02747 USA
[4] Dali Univ, Dept Math & Comp Sci, Dali 671003, Yunnan, Peoples R China
关键词
Big Data; Vegetation; Computational complexity; Forestry; Data mining; Computers; Search problems; K-nearest neighbors; random projection forests; ensemble; unsupervised learning; ALGORITHMS;
D O I
10.1109/TBDATA.2019.2908178
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
K-nearest neighbor (kNN) search is an important problem in data mining and knowledge discovery. Inspired by the huge success of tree-based methodology and ensemble methods over the last decades, we propose a new method for kNN search, random projection forests (rpForests). rpForests finds nearest neighbors by combining multiple kNN-sensitive trees with each constructed recursively through a series of random projections. As demonstrated by experiments on a wide collection of real datasets, our method achieves a remarkable accuracy in terms of fast decaying missing rate of kNNs and that of discrepancy in the k-th nearest neighbor distances. rpForests has a very low computational complexity as a tree-based methodology. The ensemble nature of rpForests makes it easily parallelized to run on clustered or multicore computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights on rpForests by showing the exponential decay of neighboring points being separated by ensemble random projection trees when the ensemble size increases. Our theory can also be used to refine the choice of random projections in the growth of rpForests; experiments show that the effect is remarkable.
引用
收藏
页码:147 / 157
页数:11
相关论文
共 50 条
  • [1] K-nearest Neighbor Search by Random Projection Forests
    Yan, Donghui
    Wang, Yingjie
    Wang, Jin
    Wang, Honggang
    Li, Zhenpeng
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 4775 - 4781
  • [2] Efficient k-nearest neighbor search on moving object trajectories
    Gueting, Ralf Hartmut
    Behr, Thomas
    Xu, Jianqiu
    VLDB JOURNAL, 2010, 19 (05) : 687 - 714
  • [3] K-Nearest Neighbor Search in Peer-to-Peer Systems
    Mashayekhi, Hoda
    Habibi, Jafar
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN P2P SYSTEMS (AP2PS 2010), 2010, : 100 - 105
  • [4] Comparative Analysis of K-Nearest Neighbor and Modified K-Nearest Neighbor Algorithm for Data Classification
    Okfalisa
    Mustakim
    Gazalba, Ikbal
    Reza, Nurul Gayatri Indah
    2017 2ND INTERNATIONAL CONFERENCES ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE): OPPORTUNITIES AND CHALLENGES ON BIG DATA FUTURE INNOVATION, 2017, : 294 - 298
  • [5] Boosting k-nearest neighbor classifier by means of input space projection
    Garcia-Pedrajas, Nicolas
    Ortiz-Boyer, Domingo
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) : 10570 - 10582
  • [6] Map Reduce by K-Nearest Neighbor Joins
    Bethu, Srikanth
    Babu, B. Sankara
    Rao, S. Govinda
    Florence, R. Aruna
    2018 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC 2018), 2018, : 222 - 231
  • [7] Applying an efficient k-nearest neighbor search to forest attribute imputation
    Department of Forest Resources, University of Minnesota, 115 Green Hall, 1530 Cleveland Ave. North, St. Paul, MN 55108, United States
    不详
    For. Sci., 2006, 2 (130-135):
  • [8] Hybrid k-Nearest Neighbor Classifier
    Yu, Zhiwen
    Chen, Hantao
    Liu, Jiming
    You, Jane
    Leung, Hareton
    Han, Guoqiang
    IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (06) : 1263 - 1275
  • [9] The Effect of Points Dispersion on the k-nn Search in Random Projection Forests
    Alshammari, Mashaan
    Stavrakakis, John
    Ahmed, Adel F.
    Takatsuka, Masahiro
    IEEE ACCESS, 2022, 10 : 80858 - 80868
  • [10] IMPROVING K-NEAREST NEIGHBOR EFFICIENCY FOR TEXT CATEGORIZATION
    Barigou, F.
    NEURAL NETWORK WORLD, 2016, 26 (01) : 45 - 65