Furthest-Pair-Based Binary Search Tree for Speeding Big Data Classification Using K-Nearest Neighbors

被引:29
作者
Hassanat, Ahmad B. A. [1 ]
机构
[1] Mutah Univ, IT Dept, POB 1212, Al Karak 61710, Jordan
关键词
AI; algorithms; BST; d-dimensional data sets; diameter; machine learning; CLASSIFIERS;
D O I
10.1089/big.2018.0064
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The advances in information technology of both hardware and software have allowed big data to emerge recently, classification of such data is extremely slow, particularly when using K-nearest neighbors (KNN) classifier. In this article, we propose a new approach that creates a binary search tree (BST) to be used later by the KNN to speed up the big data classification. This approach is based on finding the furthest-pair of points (diameter) in a data set, and then, it uses this pair of points to sort the examples of the training data set into a BST. At each node of the BST, the furthest-pair is found and the examples located at that particular node are further sorted based on their distances to these local furthest points. The created BST is then searched for a test example to the leaf; the examples found in that particular leaf are used to classify the test example using the KNN classifier. The experimental results on some well-known machine learning data sets show the efficiency of the proposed method, in terms of speed and accuracy compared with the state-of-the-art methods reviewed. With some optimization, the proposed method has a great potential to be used for big data classification and can be generalized for other applications, particularly when classification speed is the main concern.
引用
收藏
页码:225 / 235
页数:11
相关论文
共 31 条
[1]   Farthest neighbors, maximum spanning trees and related problems in higher dimensions [J].
Agarwal, P.K. ;
Matousek, J. ;
Suri, S. .
Computational Geometry: Theory and Applications, 1992, 1 (04)
[2]  
Alkasassbeh M., 2015, CAN J PURE APPL SCI, V9, P3291, DOI 10.48550/arXiv.1501.00687
[3]  
[Anonymous], 2011, LIBSVM data: Classification, regression, andmulti-label
[4]  
[Anonymous], 2014, J AM SCI
[5]   MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517
[6]  
Beygelzimer A., 2006, ICML, DOI DOI 10.1145/1143844.1143857
[7]  
Bolon-Canedo V, 2017, EUR S ART NEUR NETW, P519
[8]  
Cislak A, 2014, ACSIS-ANN COMPUT SCI, V2, P93
[9]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[10]   Efficient kNN classification algorithm for big data [J].
Deng, Zhenyun ;
Zhu, Xiaoshu ;
Cheng, Debo ;
Zong, Ming ;
Zhang, Shichao .
NEUROCOMPUTING, 2016, 195 :143-148