A Fast kNN Algorithm Using Multiple Space-Filling Curves

被引:7
作者
Barkalov, Konstantin [1 ]
Shtanyuk, Anton [1 ]
Sysoyev, Alexander [1 ]
机构
[1] Lobachevsky Univ, Dept Math Software & Supercomp Technol, Nizhnii Novgorod 603950, Russia
关键词
machine learning; kNN; dimensionality reduction; multiple space-filling curves; COMPRESSION; SEARCH;
D O I
10.3390/e24060767
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
The paper considers a time-efficient implementation of the k nearest neighbours (kNN) algorithm. A well-known approach for accelerating the kNN algorithm is to utilise dimensionality reduction methods based on the use of space-filling curves. In this paper, we take this approach further and propose an algorithm that employs multiple space-filling curves and is faster (with comparable quality) compared with the kNN algorithm, which uses kd-trees to determine the nearest neighbours. A specific method for constructing multiple Peano curves is outlined, and statements are given about the preservation of object proximity information in the course of dimensionality reduction. An experimental comparison with known kNN implementations using kd-trees was performed using test and real-life data.
引用
收藏
页数:18
相关论文
共 26 条
[1]   MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517
[2]  
Bhatt R., Skin Segmentation Dataset, UCI Machine Learning Repository
[3]   Using Peano-Hilbert space filling curves for fast bidimensional ensemble EMD realization [J].
Costa, Paulo ;
Barroso, Joao ;
Fernandes, Hugo ;
Hadjileontiadis, Leontios J. .
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2012,
[4]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[5]   ESTIMATION BY NEAREST NEIGHBOR RULE [J].
COVER, TM .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1968, 14 (01) :50-+
[6]   Multi-linearization data structure for image browsing [J].
Craver, S ;
Yeo, BL ;
Yeung, M .
STORAGE AND RETRIEVAL FOR IMAGE AND VIDEO DATABASES VII, 1998, 3656 :155-166
[7]  
Dua D., Car Evaluation Data Set
[8]   Computationally efficient approach for solving lexicographic multicriteria optimization problems [J].
Gergel, Victor ;
Kozinov, Evgeniy ;
Barkalov, Konstantin .
OPTIMIZATION LETTERS, 2021, 15 (07) :2469-2495
[9]   Trajectories, bifurcations, and pseudo-time in large clinical datasets: applications to myocardial infarction and diabetes data [J].
Golovenkin, Sergey E. ;
Bac, Jonathan ;
Chervov, Alexander ;
Mirkes, Evgeny M. ;
Orlova, Yuliya, V ;
Barillot, Emmanuel ;
Gorban, Alexander N. ;
Zinovyev, Andrei .
GIGASCIENCE, 2020, 9 (11) :1-20
[10]   Employing machine learning for theory validation and identification of experimental conditions in laserplasma physics [J].
Gonoskov, A. ;
Wallin, E. ;
Polovinkin, A. ;
Meyerov, I .
SCIENTIFIC REPORTS, 2019, 9 (1)