FPGA-Accelerated Hadoop Cluster for Deep Learning Computations

被引:12
作者
Alhamali, Abdulrahman [1 ]
Salha, Nibal [1 ]
Morcel, Raghid [1 ]
Ezzeddine, Mazen [1 ]
Hamdan, Omar [1 ]
Akkary, Haitham [1 ]
Hajj, Hazem [1 ]
机构
[1] Amer Univ Beirut, Elect & Comp Engn Dept, Beirut, Lebanon
来源
2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW) | 2015年
关键词
deep learning; convolutional neural network; Hadoop; FPGA; map-reduce;
D O I
10.1109/ICDMW.2015.148
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning algorithms have received significant attention in the last few years. Their popularity is due to their ability to achieve higher accuracy than conventional machine learning in many research areas such as speech recognition, image processing and natural language processing. Deep learning algorithms rely on multiple cascaded layers of non-linear processing units, typically composed of hidden artificial neural networks for feature extraction and transformation. However, deep learning algorithms require a large amount of computational power and significant amount of time to train. Fortunately, the training and inference algorithms of deep learning architectures expose abundant data-parallelism. We aim in this work to develop technology that exploits deep learning data parallelism in 2 ways: 1) by distributing deep computation into a Hadoop cluster or cloud of computing nodes, and 2) by using field programmable gate arrays (FPGA) hardware acceleration to speed up computationally intensive deep learning kernels. In this paper, we describe a hardware prototype of our accelerated Hadoop deep learning system architecture and report initial performance and energy reduction results. By accelerating the convolutional layers of deep learning Convolutional Neural Network, we have observed a potential speed-up of 12.6 times and an energy reduction of 87.5% on a 6-node FPGA accelerated Hadoop cluster.
引用
收藏
页码:565 / 574
页数:10
相关论文
共 36 条
  • [1] Altera Corporation, 2014, ALT DEM FPGA BAS DAT
  • [2] [Anonymous], AC SPEECH SIGN PROC
  • [3] [Anonymous], 2014, INT WILL OFF CUST CH
  • [4] [Anonymous], BIGLEARN NIPS WORKSH
  • [5] [Anonymous], 2015, P ACM SIGDA INT S FI, DOI 10.1145/2684746.2689060
  • [6] Bastien F., ARXIV12115590
  • [7] State-of-the-art in heterogeneous computing
    Brodtkorb, Andre R.
    Dyken, Christopher
    Hagen, Trond R.
    Hjelmervik, Jon M.
    Storaasli, Olaf O.
    [J]. SCIENTIFIC PROGRAMMING, 2010, 18 (01) : 1 - 33
  • [8] DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning
    Chen, Tianshi
    Du, Zidong
    Sun, Ninghui
    Wang, Jia
    Wu, Chengyong
    Chen, Yunji
    Temam, Olivier
    [J]. ACM SIGPLAN NOTICES, 2014, 49 (04) : 269 - 283
  • [9] Christian S., 2014, ARXIV14094842
  • [10] Collobert R, 2011, J MACH LEARN RES, V12, P2493