Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks

被引:147
|
作者
Zhan, Chen [1 ,2 ,3 ]
Fang, Zhenman [2 ]
Zhou, Peipei [2 ]
Pan, Peichen [3 ]
Cong, Jason [1 ,2 ,3 ]
机构
[1] Peking Univ, Ctr Energy Efficient Comp & Applicat, Beijing, Peoples R China
[2] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
[3] Falcon Comp Inc, Los Angeles, CA USA
关键词
COPROCESSOR;
D O I
10.1145/2966986.2967011
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the recent advancement of multilayer convolutional neural networks (CNN), deep learning has achieved amazing success in many areas, especially in visual content understanding and classification. To improve the performance and energy-efficiency of the computation-demanding CNN, the FPGA-based acceleration emerges as one of the most attractive alternatives. In this paper we design and implement Caffeine, a hardware/software co-designed library to efficiently accelerate the entire CNN on FPGAs. First, we propose a uniformed convolutional matrix-multiplication representation for both computation-intensive convolutional layers and communication-intensive fully connected (FCN) layers. Second, we design Caffeine with the goal to maximize the underlying FPGA computing and bandwidth resource utilization, with a key focus on the bandwidth optimization by the memory access reorganization not studied in prior work. Moreover, we implement Caffeine in the portable high-level synthesis and provide various hardware/software definable parameters for user configurations. Finally, we also integrate Caffeine into the industry-standard software deep learning framework Caffe. We evaluate Caffeine and its integration with Caffe by implementing VGG16 and AlexNet network on multiple FPGA platforms. Caffeine achieves a peak performance of 365 GOPS on Xilinx KU060 FPGA and 636 GOPS on Virtex7 690t FPGA. This is the best published result to our best knowledge. We achieve more than 100x speedup on FCN layers over previous FPGA accelerators. An end-to-end evaluation with Caffe integration shows up to 7.3x and 43.5x performance and energy gains over Caffe on a 12-core Xeon server, and 1.5x better energy-efficiency over the GPU implementation on a medium-sized FPGA (KU060). Performance projections to a system with a high-end FPGA (Virtex7 690t) shows even higher gains.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks
    Zhang, Chen
    Sun, Guangyu
    Fang, Zhenman
    Zhou, Peipei
    Pan, Peichen
    Cong, Jason
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (11) : 2072 - 2085
  • [2] Towards Acceleration of Deep Convolutional Neural Networks using Stochastic Computing
    Li, Ji
    Ren, Ao
    Li, Zhe
    Ding, Caiwen
    Yuan, Bo
    Qiu, Qinru
    Wang, Yanzhi
    2017 22ND ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2017, : 115 - 120
  • [3] Towards Better Analysis of Deep Convolutional Neural Networks
    Liu, Mengchen
    Shi, Jiaxin
    Li, Zhen
    Li, Chongxuan
    Zhu, Jun
    Liu, Shixia
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2017, 23 (01) : 91 - 100
  • [4] Towards a component-based acceleration of convolutional neural networks on FPGAs
    Kwadjo, Danielle Tchuinkou
    Tchinda, Erman Nghonda
    Mbongue, Joel Mandebi
    Bobda, Christophe
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2022, 167 : 123 - 135
  • [5] Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks
    Wang, Xiaowei
    Yu, Jiecao
    Augustine, Charles
    Iyer, Ravi
    Das, Reetuparna
    2019 25TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2019, : 81 - 93
  • [6] Acceleration of Deep Convolutional Neural Networks Using Adaptive Filter Pruning
    Singh, Pravendra
    Verma, Vinay Kumar
    Rai, Piyush
    Namboodiri, Vinay P.
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (04) : 838 - 847
  • [7] Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration
    He, Yang
    Ding, Yuhang
    Liu, Ping
    Zhu, Linchao
    Zhang, Hanwang
    Yang, Yi
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2006 - 2015
  • [8] EEG Representation in Deep Convolutional Neural Networks for Classification of Motor Imagery
    Robinson, Neethu
    Lee, Seong-Whan
    Guan, Cuntai
    2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 1322 - 1326
  • [9] Filter pruning via annealing decaying for deep convolutional neural networks acceleration
    Huang, Jiawen
    Xiong, Liyan
    Huang, Xiaohui
    Chen, Qingsen
    Huang, Peng
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2025, 28 (02):
  • [10] Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration
    He, Yang
    Liu, Ping
    Wang, Ziwei
    Hu, Zhilan
    Yang, Yi
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4335 - 4344