GPU-Accelerated Parallel Hierarchical Extreme Learning Machine on Flink for Big Data

被引：85

作者：

Chen, Cen ^{[1
,2
]}

Li, Kenli ^{[1
,2
]}

Ouyang, Aijia ^{[1
,2
,3
]}

Tang, Zhuo ^{[1
,2
]}

Li, Keqin ^{[1
,2
,4
]}

机构：

[1] Hunan Univ, Coll Informat Sci & Engn, Changsha 410082, Hunan, Peoples R China

[2] Natl Supercomp Ctr, Changsha 410082, Hunan, Peoples R China

[3] Zunyi Normal Coll, Dept Informat Engn, Zunyi 563006, Peoples R China

[4] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2017年 / 47卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Big data; deep learning (DL); Flink; GPGPU; hierarchical extreme learning machine (H-ELM); parallel; FEEDFORWARD NETWORKS; HIDDEN NODES; MAPREDUCE; APPROXIMATION; CLASSIFICATION; OPTIMIZATION; REGRESSION; ALGORITHM; SPMV;

D O I：

10.1109/TSMC.2017.2690673

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The extreme learning machine (ELM) has become one of the most important and popular algorithms of machine learning, because of its extremely fast training speed, good generalization, and universal approximation/classification capability. The proposal of hierarchical ELM (H-ELM) extends ELM from single hidden layer feedforward networks to multilayer perceptron, greatly strengthening the applicability of ELM. Generally speaking, during training H-ELM, large-scale datasets (DSTs) are needed. Therefore, how to make use of H-ELM framework in processing big data is worth further exploration. This paper proposes a parallel H-ELM algorithm based on Flink, which is one of the in-memory cluster computing platforms, and graphics processing units (GPUs). Several optimizations are adopted to improve the performance, such as cache-based scheme, reasonable partitioning strategy, memory mapping scheme for mapping specific Java virtual machine objects to buffers. Most importantly, our proposed framework for utilizing GPUs to accelerate Flink for big data is general. This framework can be utilized to accelerate many other variants of ELM and other machine learning algorithms. To the best of our knowledge, it is the first kind of library, which combines in-memory cluster computing with GPUs to parallelize H-ELM. The experimental results have demonstrated that our proposed GPU-accelerated parallel H-ELM named as GPH-ELM can efficiently process large-scale DSTs with good performance of speedup and scalability, leveraging the computing power of both CPUs and GPUs in the cluster.

引用

页码：2740 / 2753

页数：14

共 38 条

[1]

[Anonymous], 2016, FLINK PROGRAMMING GU

[2]

[Anonymous], 2016, CUDNN

[3]

[Anonymous], 2016, CUSPARSE PROGRAMMING

[4]

[Anonymous], 2016, CUBLAS PROGRAMMING G

[5] A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems [J].

Beck, Amir ;

Teboulle, Marc .

SIAM JOURNAL ON IMAGING SCIENCES, 2009, 2 (01) :183-202

[6] Representation Learning: A Review and New Perspectives [J].

Bengio, Yoshua ;

Courville, Aaron ;

Vincent, Pascal .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828

[7] Learning Deep Architectures for AI [J].

Bengio, Yoshua .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127

[8] GFlink: An In-Memory Computing Architecture on Heterogeneous CPU-GPU Clusters for Big Data [J].

Chen, Cen ;

Li, Kenli ;

Ouyang, Aijia ;

Tang, Zhuo ;

Li, Keqin .

PROCEEDINGS 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - ICPP 2016, 2016, :542-551

[9]

CHOI JY, 1992, FRONTIERS 92 : THE FOURTH SYMPOSIUM ON THE FRONTIERS OF MASSIVELY PARALLEL COMPUTATION, P120, DOI 10.1109/FMPC.1992.234898

[10]

Coates A., 2013, ICML, V28, P1337

← 1 2 3 4 →