FPGA-Accelerated Hadoop Cluster for Deep Learning Computations

被引：12

作者：

Alhamali, Abdulrahman ^{[1
]}

Salha, Nibal ^{[1
]}

Morcel, Raghid ^{[1
]}

Ezzeddine, Mazen ^{[1
]}

Hamdan, Omar ^{[1
]}

Akkary, Haitham ^{[1
]}

Hajj, Hazem ^{[1
]}

机构：

[1] Amer Univ Beirut, Elect & Comp Engn Dept, Beirut, Lebanon

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW) | 2015年

关键词：

deep learning; convolutional neural network; Hadoop; FPGA; map-reduce;

D O I：

10.1109/ICDMW.2015.148

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning algorithms have received significant attention in the last few years. Their popularity is due to their ability to achieve higher accuracy than conventional machine learning in many research areas such as speech recognition, image processing and natural language processing. Deep learning algorithms rely on multiple cascaded layers of non-linear processing units, typically composed of hidden artificial neural networks for feature extraction and transformation. However, deep learning algorithms require a large amount of computational power and significant amount of time to train. Fortunately, the training and inference algorithms of deep learning architectures expose abundant data-parallelism. We aim in this work to develop technology that exploits deep learning data parallelism in 2 ways: 1) by distributing deep computation into a Hadoop cluster or cloud of computing nodes, and 2) by using field programmable gate arrays (FPGA) hardware acceleration to speed up computationally intensive deep learning kernels. In this paper, we describe a hardware prototype of our accelerated Hadoop deep learning system architecture and report initial performance and energy reduction results. By accelerating the convolutional layers of deep learning Convolutional Neural Network, we have observed a potential speed-up of 12.6 times and an energy reduction of 87.5% on a 6-node FPGA accelerated Hadoop cluster.

引用

页码：565 / 574

页数：10

共 36 条

[1]

Altera Corporation, 2014, ALT DEM FPGA BAS DAT

[2]

[Anonymous], AC SPEECH SIGN PROC

[3]

[Anonymous], 2014, INT WILL OFF CUST CH

[4]

[Anonymous], BIGLEARN NIPS WORKSH

[5]

[Anonymous], 2015, P ACM SIGDA INT S FI, DOI 10.1145/2684746.2689060

[6]

Bastien F., ARXIV12115590

[7] State-of-the-art in heterogeneous computing [J].

Brodtkorb, Andre R. ;

Dyken, Christopher ;

Hagen, Trond R. ;

Hjelmervik, Jon M. ;

Storaasli, Olaf O. .

SCIENTIFIC PROGRAMMING, 2010, 18 (01) :1-33

[8] DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning [J].

Chen, Tianshi ;

Du, Zidong ;

Sun, Ninghui ;

Wang, Jia ;

Wu, Chengyong ;

Chen, Yunji ;

Temam, Olivier .

ACM SIGPLAN NOTICES, 2014, 49 (04) :269-283

[9]

Christian S., 2014, ARXIV14094842

[10]

Collobert R, 2011, J MACH LEARN RES, V12, P2493

← 1 2 3 4 →