Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks

被引：147

作者：

Zhan, Chen ^{[1
,2
,3
]}

Fang, Zhenman ^{[2
]}

Zhou, Peipei ^{[2
]}

Pan, Peichen ^{[3
]}

Cong, Jason ^{[1
,2
,3
]}

机构：

[1] Peking Univ, Ctr Energy Efficient Comp & Applicat, Beijing, Peoples R China

[2] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA

[3] Falcon Comp Inc, Los Angeles, CA USA

来源：

2016 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD) | 2016年

关键词：

COPROCESSOR;

D O I：

10.1145/2966986.2967011

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

With the recent advancement of multilayer convolutional neural networks (CNN), deep learning has achieved amazing success in many areas, especially in visual content understanding and classification. To improve the performance and energy-efficiency of the computation-demanding CNN, the FPGA-based acceleration emerges as one of the most attractive alternatives. In this paper we design and implement Caffeine, a hardware/software co-designed library to efficiently accelerate the entire CNN on FPGAs. First, we propose a uniformed convolutional matrix-multiplication representation for both computation-intensive convolutional layers and communication-intensive fully connected (FCN) layers. Second, we design Caffeine with the goal to maximize the underlying FPGA computing and bandwidth resource utilization, with a key focus on the bandwidth optimization by the memory access reorganization not studied in prior work. Moreover, we implement Caffeine in the portable high-level synthesis and provide various hardware/software definable parameters for user configurations. Finally, we also integrate Caffeine into the industry-standard software deep learning framework Caffe. We evaluate Caffeine and its integration with Caffe by implementing VGG16 and AlexNet network on multiple FPGA platforms. Caffeine achieves a peak performance of 365 GOPS on Xilinx KU060 FPGA and 636 GOPS on Virtex7 690t FPGA. This is the best published result to our best knowledge. We achieve more than 100x speedup on FCN layers over previous FPGA accelerators. An end-to-end evaluation with Caffe integration shows up to 7.3x and 43.5x performance and energy gains over Caffe on a 12-core Xeon server, and 1.5x better energy-efficiency over the GPU implementation on a medium-sized FPGA (KU060). Performance projections to a system with a high-end FPGA (Virtex7 690t) shows even higher gains.

引用

页数：8

共 50 条

[41] Fpar: filter pruning via attention and rank enhancement for deep convolutional neural networks acceleration
Chen, Yanming
Wu, Gang
Shuai, Mingrui
Lou, Shubin
Zhang, Yiwen
An, Zhulin
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (07) : 2973 - 2985
[42] COLOR REPRESENTATION IN DEEP NEURAL NETWORKS
Engilberge, Martin
Collins, Edo
Susstrunk, Sabine
2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 2786 - 2790
[43] The Representation of Speech in Deep Neural Networks
Scharenborg, Odette
van der Gouw, Nikki
Larson, Martha
Marchiori, Elena
MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 194 - 205
[44] TOWARDS GRADING GLEASON SCORE USING GENERICALLY TRAINED DEEP CONVOLUTIONAL NEURAL NETWORKS
Kallen, Hanna
Molin, Jesper
Heyden, Anders
Lundstrom, Claes
Astrom, Kalle
2016 IEEE 13TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2016, : 1163 - 1167
[45] TOWARDS DEEP UNSUPERVISED SAR DESPECKLING WITH BLIND-SPOT CONVOLUTIONAL NEURAL NETWORKS
Molini, Andrea Bordone
Valsesia, Diego
Fracastoro, Giulia
Magli, Enrico
IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 2507 - 2510
[46] Towards resource-frugal deep convolutional neural networks for hyperspectral image segmentation
Nalepa, Jakub
Antoniak, Marek
Myller, Michal
Lorenzo, Pablo Ribalta
Marcinkiewicz, Michal
MICROPROCESSORS AND MICROSYSTEMS, 2020, 73
[47] Towards End-to-End Speech Recognition with Deep Multipath Convolutional Neural Networks
Zhang, Wei
Zhai, Minghao
Huang, Zilong
Liu, Chen
Li, Wei
Cao, Yi
INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2019, PART VI, 2019, 11745 : 332 - 341
[48] Towards Understanding the Invertibility of Convolutional Neural Networks
Gilbert, Anna C.
Zhang, Yi
Lee, Kibok
Zhang, Yuting
Lee, Honglak
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1703 - 1710
[49] Towards Robust Compressed Convolutional Neural Networks
Wijayanto, Arie Wahyu
Choong, Jun Jin
Madhawa, Kaushalya
Murata, Tsuyoshi
2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2019, : 168 - 175
[50] Towards dropout training for convolutional neural networks
Wu, Haibing
Gu, Xiaodong
NEURAL NETWORKS, 2015, 71 : 1 - 10

← 1 2 3 4 5 →