A Pipelined and Scalable Dataflow Implementation of Convolutional Neural Networks on FPGA

被引：14

作者：

Bacis, Marco ^{[1
]}

Natale, Giuseppe ^{[1
]}

Del Sozzo, Emanuele ^{[1
]}

Santambrogio, Marco Domenico ^{[1
]}

机构：

[1] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy

来源：

2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW) | 2017年

关键词：

Field Programmable Gate Arrays; Convolutional Neural Networks; Dataflow Architectures; COPROCESSOR; PERFORMANCE;

D O I：

10.1109/IPDPSW.2017.44

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Convolutional Neural Network (CNN) is a deep learning algorithm extended from Artificial Neural Network (ANN) and widely used for image classification and recognition, thanks to its invariance to distortions. The recent rapid growth of applications based on deep learning algorithms, especially in the context of Big Data analytics, has dramatically improved both industrial and academic research and exploration of optimized implementations of CNNs on accelerators such as GPUs, FPGAs and ASICs, as general purpose processors can hardly meet the ever increasing performance and energy-efficiency requirements. FPGAs in particular are one of the most attractive alternative, as they allow the exploitation of the implicit parallelism of the algorithm and the acceleration of the different layers of a CNN with custom optimizations, while retaining extreme flexibility thanks to their reconfigurability. In this work, we propose a methodology to implement CNNs on FPGAs in a modular, scalable way. This is done by exploiting the dataflow pattern of convolutions, using an approach derived from previous work on the acceleration of Iterative Stencil Loops (ISLs), a computational pattern that shares some characteristics with convolutions. Furthermore, this approach allows the implementation of a high-level pipeline between the different network layers, resulting in an increase of the overall performance when the CNN is employed to process batches of multiple images, as it would happen in real-life scenarios.

引用

页码：90 / 97

页数：8

共 27 条

[1]

[Anonymous], 2013, IEEE T PATTERN ANAL, DOI DOI 10.1109/TPAMI.2012.59

[2]

[Anonymous], 2016, ICCAD

[3]

[Anonymous], 2016, P 35 INT C COMP AID

[4]

[Anonymous], 2007, IEEE INT C ICML

[5]

[Anonymous], VIV HLS

[6]

[Anonymous], 2011, 22 INT JT C ART INT, DOI 10.5555/2283516.2283603

[7]

[Anonymous], NEUR INF PROC SYST C

[8]

[Anonymous], 2014, ACM INT C MULTIMEDIA

[9]

[Anonymous], ACCELERATING DEEP CO

[10] A Programmable Parallel Accelerator for Learning and Classification [J].

Cadambi, Srihari ;

Majumdar, Abhinandan ;

Becchi, Michela ;

Chakradhar, Srimat ;

Graf, Hans Peter .

PACT 2010: PROCEEDINGS OF THE NINETEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2010, :273-283

← 1 2 3 →