Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification

被引:25
作者
Hassanat, Ahmad B. A. [1 ]
机构
[1] Mutah Univ, Informat Technol Coll, Al Karak 61710, Jordan
关键词
Big Data classification; machine learning datasets; binary search tree; norms;
D O I
10.3390/computers7040054
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Due to their large sizes and/or dimensions, the classification of Big Data is a challenging task using traditional machine learning, particularly if it is carried out using the well-known K-nearest neighbors classifier (KNN) classifier, which is a slow and lazy classifier by its nature. In this paper, we propose a new approach to Big Data classification using the KNN classifier, which is based on inserting the training examples into a binary search tree to be used later for speeding up the searching process for test examples. For this purpose, we used two methods to sort the training examples. The first calculates the minimum/maximum scaled norm and rounds it to 0 or 1 for each example. Examples with 0-norms are sorted in the left-child of a node, and those with 1-norms are sorted in the right child of the same node; this process continues recursively until we obtain one example or a small number of examples with the same norm in a leaf node. The second proposed method inserts each example into the binary search tree based on its similarity to the examples of the minimum and maximum Euclidean norms. The experimental results of classifying several machine learning big datasets show that both methods are much faster than most of the state-of-the-art methods compared, with competing accuracy rates obtained by the second method, which shows great potential for further enhancements of both methods to be used in practice.
引用
收藏
页数:14
相关论文
共 18 条
  • [1] Altarawneh G.A., 2015, CAN J PURE APPL SCI, V9, P3291
  • [2] [Anonymous], 2014, J AM SCI
  • [3] MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING
    BENTLEY, JL
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (09) : 509 - 517
  • [4] Beygelzimer Alina, 2006, ICML
  • [5] Bolon-Canedo V., 2017, P EUR S ART NEUR NET, P517
  • [6] Cislak A., 2014, P FED C COMP SCI INF
  • [7] Efficient kNN classification algorithm for big data
    Deng, Zhenyun
    Zhu, Xiaoshu
    Cheng, Debo
    Zong, Ming
    Zhang, Shichao
    [J]. NEUROCOMPUTING, 2016, 195 : 143 - 148
  • [8] Fan Rong-En, LIBSVM DATA CLASSIFI
  • [9] Clustering-based k-nearest neighbor classification for large-scale data with neural codes representation
    Gallego, Antonio-Javier
    Calvo-Zaragoza, Jorge
    Valero-Mas, Jose J.
    Rico-Juan, Juan R.
    [J]. PATTERN RECOGNITION, 2018, 74 : 531 - 543
  • [10] Hassanat A.B., 2014, ARXIV14090919