Speeding up k-Nearest Neighbors classifier for large-scale multi-label learning on GPUs

被引:27
作者
Skryjomski, Przemyslaw [1 ]
Krawczyk, Bartosz [1 ]
Cano, Alberto [1 ]
机构
[1] Virginia Commonwealth Univ, Sch Engn, Dept Comp Sci, 401 West Main St,POB 843019, Richmond, VA 23284 USA
关键词
Machine learning; Multi-label classification; GPU computing; Large-scale data mining; KNN; MODEL;
D O I
10.1016/j.neucom.2018.06.095
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-label classification is one of the most dynamically growing fields of machine learning, due to its numerous real-life applications in solving problems that can be described by multiple labels at the same time. While most of works in this field focus on proposing novel and accurate classification algorithms, the issue of the computational complexity on growing dataset sizes is somehow marginalized. Owning to the ever-increasing capabilities of data capturing, we are faced with the problem of large-scale data mining that forces learners to be not only highly accurate, but also fast and scalable on high-dimensional spaces of instances, features, and labels. In this paper, we propose a highly efficient parallel approach for computing the multi-label k-Nearest Neighbor classifier on GPUs. While this method is highly effective due to its accuracy and simplicity, its computational complexity makes it prohibitive for large-scale data. We propose a four-step implementation that takes an advantage of the GPU architecture, allowing for an efficient execution of the multi-label k-Nearest Neighbors classifier without any loss of accuracy. Experiments carried out on a number of real and artificial benchmarks show that we are able to achieve speedups up to 200 times when compared to a sequential CPU execution, while efficiently scaling up to varying number of instances and features. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:10 / 19
页数:10
相关论文
共 36 条
[1]   Volume, variety and velocity in Data Science [J].
Alonso-Betanzos, Ampar ;
Gamez, Jose A. ;
Herrera, Francisco ;
Puerta, Jose M. ;
Riquelme, Jose C. .
KNOWLEDGE-BASED SYSTEMS, 2017, 117 :1-2
[2]   GPU-FS-kNN: A Software Tool for Fast and Scalable kNN Computation Using GPUs [J].
Arefin, Ahmed Shamsul ;
Riveros, Carlos ;
Berretta, Regina ;
Moscato, Pablo .
PLOS ONE, 2012, 7 (08)
[3]   A new fast prototype selection method based on clustering [J].
Arturo Olvera-Lopez, J. ;
Ariel Carrasco-Ochoa, J. ;
Francisco Martinez-Trinidad, J. .
PATTERN ANALYSIS AND APPLICATIONS, 2010, 13 (02) :131-141
[4]   GPU-based exhaustive algorithms processing kNN queries [J].
Barrientos, Ricardo J. ;
Millaguir, Fabricio ;
Sanchez, Jos L. ;
Arias, Enrique .
JOURNAL OF SUPERCOMPUTING, 2017, 73 (10) :4611-4634
[5]   Multi-label Classification of Anemia Patients [J].
Bellinger, Colin ;
Amid, Ali ;
Japkowicz, Nathalie ;
Victor, Herna .
2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, :825-830
[7]   An ensemble approach to multi-view multi-instance learning [J].
Cano, Alberto .
KNOWLEDGE-BASED SYSTEMS, 2017, 136 :46-57
[8]   LAIM discretization for multi-label data [J].
Cano, Alberto ;
Maria Luna, Jose ;
Gibaja, Eva L. ;
Ventura, Sebastian .
INFORMATION SCIENCES, 2016, 330 :370-384
[9]   Addressing imbalance in multilabel classification: Measures and random resampling algorithms [J].
Charte, Francisco ;
Rivera, Antonio J. ;
del Jesus, Maria J. ;
Herrera, Francisco .
NEUROCOMPUTING, 2015, 163 :3-16
[10]   Sweet KNN: An Efficient KNN on GPU through Reconciliation between Redundancy Removal and Regularity [J].
Chen, Guoyang ;
Ding, Yufei ;
Shen, Xipeng .
2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, :621-632