Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks

被引:147
|
作者
Zhan, Chen [1 ,2 ,3 ]
Fang, Zhenman [2 ]
Zhou, Peipei [2 ]
Pan, Peichen [3 ]
Cong, Jason [1 ,2 ,3 ]
机构
[1] Peking Univ, Ctr Energy Efficient Comp & Applicat, Beijing, Peoples R China
[2] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
[3] Falcon Comp Inc, Los Angeles, CA USA
关键词
COPROCESSOR;
D O I
10.1145/2966986.2967011
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the recent advancement of multilayer convolutional neural networks (CNN), deep learning has achieved amazing success in many areas, especially in visual content understanding and classification. To improve the performance and energy-efficiency of the computation-demanding CNN, the FPGA-based acceleration emerges as one of the most attractive alternatives. In this paper we design and implement Caffeine, a hardware/software co-designed library to efficiently accelerate the entire CNN on FPGAs. First, we propose a uniformed convolutional matrix-multiplication representation for both computation-intensive convolutional layers and communication-intensive fully connected (FCN) layers. Second, we design Caffeine with the goal to maximize the underlying FPGA computing and bandwidth resource utilization, with a key focus on the bandwidth optimization by the memory access reorganization not studied in prior work. Moreover, we implement Caffeine in the portable high-level synthesis and provide various hardware/software definable parameters for user configurations. Finally, we also integrate Caffeine into the industry-standard software deep learning framework Caffe. We evaluate Caffeine and its integration with Caffe by implementing VGG16 and AlexNet network on multiple FPGA platforms. Caffeine achieves a peak performance of 365 GOPS on Xilinx KU060 FPGA and 636 GOPS on Virtex7 690t FPGA. This is the best published result to our best knowledge. We achieve more than 100x speedup on FCN layers over previous FPGA accelerators. An end-to-end evaluation with Caffe integration shows up to 7.3x and 43.5x performance and energy gains over Caffe on a 12-core Xeon server, and 1.5x better energy-efficiency over the GPU implementation on a medium-sized FPGA (KU060). Performance projections to a system with a high-end FPGA (Virtex7 690t) shows even higher gains.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Numerosity representation in a deep convolutional neural network
    Zhou, Cihua
    Xu, Wei
    Liu, Yujie
    Xue, Zhichao
    Chen, Rui
    Zhou, Ke
    Liu, Jia
    JOURNAL OF PACIFIC RIM PSYCHOLOGY, 2021, 15
  • [22] Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
    Zhang, Ying
    Pezeshki, Mohammad
    Brakel, Philemon
    Zhang, Saizheng
    Laurent, Cesar
    Bengio, Yoshua
    Courville, Aaron
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 410 - 414
  • [23] Towards Automatic Prostate Gleason Grading via Deep Convolutional Neural Networks
    Khani, Ali Asghar
    Jahromi, Seyed Alireza Fatemi
    Shahrezat, Hatef Otroshi
    Behroozit, Hamid
    Baghshah, Mandieh Soleymani
    2019 5TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS 2019), 2019,
  • [24] Deep Anchored Convolutional Neural Networks
    Huang, Jiahui
    Dwivedi, Kshitij
    Roig, Gemma
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 639 - 647
  • [25] DEEP CONVOLUTIONAL NEURAL NETWORKS FOR LVCSR
    Sainath, Tara N.
    Mohamed, Abdel-rahman
    Kingsbury, Brian
    Ramabhadran, Bhuvana
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8614 - 8618
  • [26] Deep Unitary Convolutional Neural Networks
    Chang, Hao-Yuan
    Wang, Kang L.
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT II, 2021, 12892 : 170 - 181
  • [27] Universality of deep convolutional neural networks
    Zhou, Ding-Xuan
    APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS, 2020, 48 (02) : 787 - 794
  • [28] A Review on Deep Convolutional Neural Networks
    Aloysius, Neena
    Geetha, M.
    2017 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2017, : 588 - 592
  • [29] SELF-REPRESENTATION CONVOLUTIONAL NEURAL NETWORKS
    Gao, Hongchao
    Wang, Xi
    Li, Yujia
    Han, Jizhong
    Hu, Songlin
    Li, Ruixuan
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1672 - 1677
  • [30] Spatial deep convolutional neural networks
    Wang, Qi
    Parker, Paul A.
    Lund, Robert
    SPATIAL STATISTICS, 2025, 66