Speeding up k-Nearest Neighbors classifier for large-scale multi-label learning on GPUs

被引：27

作者：

Skryjomski, Przemyslaw ^{[1
]}

Krawczyk, Bartosz ^{[1
]}

Cano, Alberto ^{[1
]}

机构：

[1] Virginia Commonwealth Univ, Sch Engn, Dept Comp Sci, 401 West Main St,POB 843019, Richmond, VA 23284 USA

来源：

NEUROCOMPUTING | 2019年 / 354卷

关键词：

Machine learning; Multi-label classification; GPU computing; Large-scale data mining; KNN; MODEL;

D O I：

10.1016/j.neucom.2018.06.095

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-label classification is one of the most dynamically growing fields of machine learning, due to its numerous real-life applications in solving problems that can be described by multiple labels at the same time. While most of works in this field focus on proposing novel and accurate classification algorithms, the issue of the computational complexity on growing dataset sizes is somehow marginalized. Owning to the ever-increasing capabilities of data capturing, we are faced with the problem of large-scale data mining that forces learners to be not only highly accurate, but also fast and scalable on high-dimensional spaces of instances, features, and labels. In this paper, we propose a highly efficient parallel approach for computing the multi-label k-Nearest Neighbor classifier on GPUs. While this method is highly effective due to its accuracy and simplicity, its computational complexity makes it prohibitive for large-scale data. We propose a four-step implementation that takes an advantage of the GPU architecture, allowing for an efficient execution of the multi-label k-Nearest Neighbors classifier without any loss of accuracy. Experiments carried out on a number of real and artificial benchmarks show that we are able to achieve speedups up to 200 times when compared to a sequential CPU execution, while efficiently scaling up to varying number of instances and features. (C) 2019 Elsevier B.V. All rights reserved.

引用

页码：10 / 19

页数：10

共 36 条

[1] Volume, variety and velocity in Data Science [J].