High Performance Training of Deep Neural Networks Using Pipelined Hardware Acceleration and Distributed Memory

被引:0
作者
Mehta, Ragav [1 ]
Huang, Yuyang [2 ]
Cheng, Mingxi [3 ]
Bagga, Shrey [4 ]
Mathur, Nishant [4 ]
Li, Ji [4 ]
Draper, Jeffrey [4 ]
Nazarian, Shahin [4 ]
机构
[1] Mentor, Wilsonville, OR USA
[2] Nvidia, Shanghai, Peoples R China
[3] Duke Univ, Durham, NC USA
[4] Univ Southern Calif, Los Angeles, CA 90007 USA
来源
2018 19TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED) | 2018年
关键词
Deep learning; neural network; hardware design; MACHINE;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, Deep Neural Networks (DNNs) have made unprecedented progress in various tasks. However, there is a timely need to accelerate the training process in DNNs specifically for real-time applications that demand high performance, energy efficiency and compactness. Numerous algorithms have been proposed to improve the accuracy, however the network training process is computationally slow. In this paper, we present a scalable pipelined hardware architecture with distributed memories for a digital neuron to implement deep neural networks. We also explore various functions and algorithms as well as different memory topologies, to optimize the performance of our training architecture. The power, area, and delay of our proposed model are evaluated with respect to software implementation. Experimental results on the MNIST dataset demonstrate that compared with the software training, our proposed hardware based approach for training process achieves 33X runtime reduction, 5X power reduction, and nearly 168X energy reduction.
引用
收藏
页码:383 / 388
页数:6
相关论文
共 28 条
  • [1] Allaire J., 2016, TENSORFLOW R INTERFA
  • [2] [Anonymous], INTEGRATION VLSI J
  • [3] Chauvin Y., 1995, Backpropagation: Theory, architectures, and applications
  • [4] Chen YH, 2016, ISSCC DIG TECH PAP I, V59, P262, DOI 10.1109/ISSCC.2016.7418007
  • [5] DaDianNao: A Machine-Learning Supercomputer
    Chen, Yunji
    Luo, Tao
    Liu, Shaoli
    Zhang, Shijin
    He, Liqiang
    Wang, Jia
    Li, Ling
    Chen, Tianshi
    Xu, Zhiwei
    Sun, Ninghui
    Temam, Olivier
    [J]. 2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, : 609 - 622
  • [6] Coates A., 2013, ICML, V28, P1337
  • [7] Fundamental Technologies in Modern Speech Recognition
    Furui, Sadaoki
    Deng, Li
    Gales, Mark
    Ney, Hermann
    Tokuda, Keiichi
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 16 - 17
  • [8] Gomar S, 2016, CONF REC ASILOMAR C, P1586, DOI 10.1109/ACSSC.2016.7869646
  • [9] EIE: Efficient Inference Engine on Compressed Deep Neural Network
    Han, Song
    Liu, Xingyu
    Mao, Huizi
    Pu, Jing
    Pedram, Ardavan
    Horowitz, Mark A.
    Dally, William J.
    [J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 243 - 254
  • [10] Extreme learning machine: Theory and applications
    Huang, Guang-Bin
    Zhu, Qin-Yu
    Siew, Chee-Kheong
    [J]. NEUROCOMPUTING, 2006, 70 (1-3) : 489 - 501