The Effect of Points Dispersion on the k-nn Search in Random Projection Forests

被引:0
作者
Alshammari, Mashaan [1 ]
Stavrakakis, John [2 ]
Ahmed, Adel F. [3 ]
Takatsuka, Masahiro [2 ]
机构
[1] Univ Hail, Informat & Comp Sci Dept, Hail 55255, Saudi Arabia
[2] Univ Sydney, Sch Comp Sci, Sydney, NSW 2006, Australia
[3] King Fahd Univ Petr & Minerals, Informat & Comp Sci Dept, Dhahran 31261, Saudi Arabia
来源
IEEE ACCESS | 2022年 / 10卷
关键词
k-nearest neighbor search; random projection trees; random projection forests; unsupervised learning; NEAREST-NEIGHBOR; ALGORITHMS;
D O I
10.1109/ACCESS.2022.3195488
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Partitioning trees are efficient data structures for k-nearest neighbor search. Machine learning libraries commonly use a special type of partitioning trees called kd-trees to perform k-nn search. Unfortunately, kd-trees can be ineffective in high dimensions because they need more tree levels to decrease the vector quantization (VQ) error. Random projection trees rpTrees solve this scalability problem by using random directions to split the data. A collection of rpTrees is called rpForest. k-nn search in an rpForest is influenced by two factors: 1) the dispersion of points along the random direction and 2) the number of rpTrees in the rpForest. In this study, we investigate how these two factors affect the k-nn search with varying k values and different datasets. We found that with larger number of trees, the dispersion of points has a very limited effect on the k-nn search. One should use the original rpTree algorithm by picking a random direction regardless of the dispersion of points.
引用
收藏
页码:80858 / 80868
页数:11
相关论文
共 31 条
  • [1] Principal component analysis
    Abdi, Herve
    Williams, Lynne J.
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04): : 433 - 459
  • [2] Refining a k-nearest neighbor graph for a computationally efficient spectral clustering
    Alshammari, Mashaan
    Stavrakakis, John
    Takatsuka, Masahiro
    [J]. PATTERN RECOGNITION, 2021, 114
  • [3] Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
    Andoni, Alexandr
    Indyk, Piotr
    [J]. COMMUNICATIONS OF THE ACM, 2008, 51 (01) : 117 - 122
  • [4] MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING
    BENTLEY, JL
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (09) : 509 - 517
  • [5] Buitinck Lars, 2013, ECML PKDD WORKSHOP L
  • [6] Randomized Partition Trees for Nearest Neighbor Search
    Dasgupta, Sanjoy
    Sinha, Kaushik
    [J]. ALGORITHMICA, 2015, 72 (01) : 237 - 263
  • [7] Dasgupta S, 2008, ACM S THEORY COMPUT, P537
  • [8] Dua D., UCI Machine Learning Repository
  • [9] Fanti C, 2004, ADV NEUR IN, V16, P1603
  • [10] K-means properties on six clustering benchmark datasets
    Franti, Pasi
    Sieranoja, Sami
    [J]. APPLIED INTELLIGENCE, 2018, 48 (12) : 4743 - 4759