Work-in-Progress: A Power-Efficient and High Performance FPGA Accelerator for Convolutional Neural Networks

被引：6

作者：

Gong, Lei ^{[1
]}

Wang, Chao ^{[1
]}

Li, Xi ^{[1
]}

Chen, Huaping ^{[1
]}

Zhou, Xuehai ^{[1
]}

机构：

[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei, Anhui, Peoples R China

来源：

2017 INTERNATIONAL CONFERENCE ON HARDWARE/SOFTWARE CODESIGN AND SYSTEM SYNTHESIS (CODES+ISSS) | 2017年

关键词：

CNNs; FPGA-based Accelerator; Power Efficient; Pipelines;

D O I：

10.1145/3125502.3125534

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, FPGAs have been widely used in the implementation of hardware accelerators for Convolutional Neural Networks (CNN), especially on mobile and embedded devices. However, most of these existing accelerators are designed with the same concept as their ASIC counterparts, that is all operations from different CNN layers are mapped to the same hardware units and work in a multiplexed way. Although this approach improves the generality of these accelerators, it does not take full advantage of reconfigurability and customizability of FPGAs, resulting in a certain degree of computational efficiency degradation, which is even worse on the embedded platforms. In this paper, we propose an FPGA-based CNN accelerator with all the layers mapped to their own on-chip units, and working concurrently as a pipeline. A strategy which can find the optimized paralleling scheme for each layer is proposed to eliminate the pipeline stall and achieve high resource utilization. In addition, a balanced pruning-based method is applied on fully connected (FC) layers to reduce the computational redundancy. As a case study, we implement a widely used CNNs model, LeNet-5, on an embedded FPGA device, Xilinx Zedboard. It can achieve a peak performance of 39.78 GOP/s and the power efficiency with a value 19.6 GOP/s/W which outperforms previous approaches.

引用

页数：2

共 5 条

[1]

[Anonymous], FPGA 2016

[2]

[Anonymous], INT C COMP AID DES I

[3]

Mao JC, 2017, DES AUT TEST EUROPE, P1396, DOI 10.23919/DATE.2017.7927211

[4] DLAU: A Scalable Deep Learning Accelerator Unit on FPGA [J].

Wang, Chao ;

Gong, Lei ;

Yu, Qi ;

Li, Xi ;

Xie, Yuan ;

Zhou, Xuehai .

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2017, 36 (03) :513-517

[5]

Zhang C, 2015, P 2015 ACM SIGDA INT, P161, DOI 10.1145/2684746.2689060

← 1 →