The Effect of Points Dispersion on the k-nn Search in Random Projection Forests

被引：0

作者：

Alshammari, Mashaan ^{[1
]}

Stavrakakis, John ^{[2
]}

Ahmed, Adel F. ^{[3
]}

Takatsuka, Masahiro ^{[2
]}

机构：

[1] Univ Hail, Informat & Comp Sci Dept, Hail 55255, Saudi Arabia

[2] Univ Sydney, Sch Comp Sci, Sydney, NSW 2006, Australia

[3] King Fahd Univ Petr & Minerals, Informat & Comp Sci Dept, Dhahran 31261, Saudi Arabia

来源：

IEEE ACCESS | 2022年 / 10卷

关键词：

k-nearest neighbor search; random projection trees; random projection forests; unsupervised learning; NEAREST-NEIGHBOR; ALGORITHMS;

D O I：

10.1109/ACCESS.2022.3195488

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Partitioning trees are efficient data structures for k-nearest neighbor search. Machine learning libraries commonly use a special type of partitioning trees called kd-trees to perform k-nn search. Unfortunately, kd-trees can be ineffective in high dimensions because they need more tree levels to decrease the vector quantization (VQ) error. Random projection trees rpTrees solve this scalability problem by using random directions to split the data. A collection of rpTrees is called rpForest. k-nn search in an rpForest is influenced by two factors: 1) the dispersion of points along the random direction and 2) the number of rpTrees in the rpForest. In this study, we investigate how these two factors affect the k-nn search with varying k values and different datasets. We found that with larger number of trees, the dispersion of points has a very limited effect on the k-nn search. One should use the original rpTree algorithm by picking a random direction regardless of the dispersion of points.

引用

页码：80858 / 80868

页数：11

共 31 条

[1] Principal component analysis
Abdi, Herve
Williams, Lynne J.
[J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04): : 433 - 459
[2] Refining a k-nearest neighbor graph for a computationally efficient spectral clustering
Alshammari, Mashaan
Stavrakakis, John
Takatsuka, Masahiro
[J]. PATTERN RECOGNITION, 2021, 114
[3] Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Andoni, Alexandr
Indyk, Piotr
[J]. COMMUNICATIONS OF THE ACM, 2008, 51 (01) : 117 - 122
[4] MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING
BENTLEY, JL
[J]. COMMUNICATIONS OF THE ACM, 1975, 18 (09) : 509 - 517
[5] Buitinck Lars, 2013, ECML PKDD WORKSHOP L
[6] Randomized Partition Trees for Nearest Neighbor Search
Dasgupta, Sanjoy
Sinha, Kaushik
[J]. ALGORITHMICA, 2015, 72 (01) : 237 - 263
[7] Dasgupta S, 2008, ACM S THEORY COMPUT, P537
[8] Dua D., UCI Machine Learning Repository
[9] Fanti C, 2004, ADV NEUR IN, V16, P1603
[10] K-means properties on six clustering benchmark datasets
Franti, Pasi
Sieranoja, Sami
[J]. APPLIED INTELLIGENCE, 2018, 48 (12) : 4743 - 4759

← 1 2 3 4 →