Furthest-Pair-Based Binary Search Tree for Speeding Big Data Classification Using K-Nearest Neighbors

被引：29

作者：

Hassanat, Ahmad B. A. ^{[1
]}

机构：

[1] Mutah Univ, IT Dept, POB 1212, Al Karak 61710, Jordan

来源：

BIG DATA | 2018年 / 6卷 / 03期

关键词：

AI; algorithms; BST; d-dimensional data sets; diameter; machine learning; CLASSIFIERS;

D O I：

10.1089/big.2018.0064

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The advances in information technology of both hardware and software have allowed big data to emerge recently, classification of such data is extremely slow, particularly when using K-nearest neighbors (KNN) classifier. In this article, we propose a new approach that creates a binary search tree (BST) to be used later by the KNN to speed up the big data classification. This approach is based on finding the furthest-pair of points (diameter) in a data set, and then, it uses this pair of points to sort the examples of the training data set into a BST. At each node of the BST, the furthest-pair is found and the examples located at that particular node are further sorted based on their distances to these local furthest points. The created BST is then searched for a test example to the leaf; the examples found in that particular leaf are used to classify the test example using the KNN classifier. The experimental results on some well-known machine learning data sets show the efficiency of the proposed method, in terms of speed and accuracy compared with the state-of-the-art methods reviewed. With some optimization, the proposed method has a great potential to be used for big data classification and can be generalized for other applications, particularly when classification speed is the main concern.

引用

页码：225 / 235

页数：11

共 31 条

[1] Farthest neighbors, maximum spanning trees and related problems in higher dimensions [J].

Agarwal, P.K. ;

Matousek, J. ;

Suri, S. .

Computational Geometry: Theory and Applications, 1992, 1 (04)

[2]

Alkasassbeh M., 2015, CAN J PURE APPL SCI, V9, P3291, DOI 10.48550/arXiv.1501.00687

[3]

[Anonymous], 2011, LIBSVM data: Classification, regression, andmulti-label

[4]

[Anonymous], 2014, J AM SCI

[5] MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].

BENTLEY, JL .

COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517

[6]

Beygelzimer A., 2006, ICML, DOI DOI 10.1145/1143844.1143857

[7]

Bolon-Canedo V, 2017, EUR S ART NEUR NETW, P519

[8]

Cislak A, 2014, ACSIS-ANN COMPUT SCI, V2, P93

[9] NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].

COVER, TM ;

HART, PE .

IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+

[10] Efficient kNN classification algorithm for big data [J].

Deng, Zhenyun ;

Zhu, Xiaoshu ;

Cheng, Debo ;

Zong, Ming ;

Zhang, Shichao .

NEUROCOMPUTING, 2016, 195 :143-148

← 1 2 3 4 →