Furthest-Pair-Based Binary Search Tree for Speeding Big Data Classification Using K-Nearest Neighbors

被引:28
作者
Hassanat, Ahmad B. A. [1 ]
机构
[1] Mutah Univ, IT Dept, POB 1212, Al Karak 61710, Jordan
关键词
AI; algorithms; BST; d-dimensional data sets; diameter; machine learning; CLASSIFIERS;
D O I
10.1089/big.2018.0064
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The advances in information technology of both hardware and software have allowed big data to emerge recently, classification of such data is extremely slow, particularly when using K-nearest neighbors (KNN) classifier. In this article, we propose a new approach that creates a binary search tree (BST) to be used later by the KNN to speed up the big data classification. This approach is based on finding the furthest-pair of points (diameter) in a data set, and then, it uses this pair of points to sort the examples of the training data set into a BST. At each node of the BST, the furthest-pair is found and the examples located at that particular node are further sorted based on their distances to these local furthest points. The created BST is then searched for a test example to the leaf; the examples found in that particular leaf are used to classify the test example using the KNN classifier. The experimental results on some well-known machine learning data sets show the efficiency of the proposed method, in terms of speed and accuracy compared with the state-of-the-art methods reviewed. With some optimization, the proposed method has a great potential to be used for big data classification and can be generalized for other applications, particularly when classification speed is the main concern.
引用
收藏
页码:225 / 235
页数:11
相关论文
共 31 条
  • [1] Farthest neighbors, maximum spanning trees and related problems in higher dimensions
    Agarwal, P.K.
    Matousek, J.
    Suri, S.
    [J]. Computational Geometry: Theory and Applications, 1992, 1 (04):
  • [2] Alkasassbeh M., 2015, CAN J PURE APPL SCI, V9, P3291, DOI 10.48550/arXiv.1501.00687
  • [3] [Anonymous], 2011, LIBSVM data: Classification, regression, andmulti-label
  • [4] [Anonymous], 2014, J AM SCI
  • [5] MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING
    BENTLEY, JL
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (09) : 509 - 517
  • [6] Beygelzimer A., 2006, ICML, DOI DOI 10.1145/1143844.1143857
  • [7] Bolon-Canedo V, 2017, EUR S ART NEUR NETW, P519
  • [8] Cislak A, 2014, ACSIS-ANN COMPUT SCI, V2, P93
  • [9] NEAREST NEIGHBOR PATTERN CLASSIFICATION
    COVER, TM
    HART, PE
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) : 21 - +
  • [10] Efficient kNN classification algorithm for big data
    Deng, Zhenyun
    Zhu, Xiaoshu
    Cheng, Debo
    Zong, Ming
    Zhang, Shichao
    [J]. NEUROCOMPUTING, 2016, 195 : 143 - 148