Toward Multi-FPGA Acceleration of the Neural Networks

被引：25

作者：

Biookaghazadeh, Saman ^{[1
]}

Ravi, Pravin Kumar ^{[1
]}

Zhao, Ming ^{[1
]}

机构：

[1] Arizona State Univ, Sch Comp Informat & Decis Syst Engn, 699 S Mill Ave, Tempe, AZ 85281 USA

来源：

ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS | 2021年 / 17卷 / 02期

基金：

美国国家科学基金会;

关键词：

FPGA; neural networks; distributed systems;

D O I：

10.1145/3432816

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

High-throughput and low-latency Convolutional Neural Network (CNN) inference is increasingly important for many cloud- and edge-computing applications. FPGA-based acceleration of CNN inference has demonstrated various benefits compared to other high-performance devices such as GPGPUs. Current FPGA CNN-acceleration solutions are based on a single FPGA design, which are limited by the available resources on an FPGA. In addition, they can only accelerate conventional 2D neural networks. To address these limitations, we present a generic multi-FPGA solution, written in OpenCL, which can accelerate more complex CNNs (e.g., C3D CNN) and achieve a near linear speedup with respect to the available single-FPGA solutions. The design is built upon the Intel Deep Learning Accelerator architecture, with three extensions. First, it includes updates for better area efficiency (up to 25%) and higher performance (up to 24%). Second, it supports 3D convolutions for more challenging applications such as video learning. Third, it supports multi-FPGA communication for higher inference throughput. The results show that utilizing multiple FPGAs can linearly increase the overall bandwidth while maintaining the same end-to-end latency. In addition, the design can outperform other FPGA 2D accelerators by up to 8.4 times and 3D accelerators by up to 1.7 times.

引用

页数：23

共 28 条

[1]

Abadi M, 2016, ACM SIGPLAN NOTICES, V51, P1, DOI [10.1145/3022670.2976746, 10.1145/2951913.2976746]

[2] Deep Reinforcement Learning A brief survey [J].

Arulkumaran, Kai ;

Deisenroth, Marc Peter ;

Brundage, Miles ;

Bharath, Anil Anthony .

IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38

[3] An OpenCLTM Deep Learning Accelerator on Arria 10 [J].

Aydonat, Utku ;

O'Connell, Shane ;

Capalija, Davor ;

Ling, Andrew C. ;

Chiu, Gordon R. .

FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, :55-64

[4]

Biookaghazadeh S, 2018, P USENIX WORKSH HOT, P1

[5] You Cannot Improve What You Do not Measure: FPGA vs. ASIC Efficiency Gaps for Convolutional Neural Network Inference [J].

Boutros, Andrew ;

Yazdanshenas, Sadegh ;

Betz, Vaughn .

ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2018, 11 (03)

[6] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

[7] A Configurable Cloud-Scale DNN Processor for Real-Time AI [J].

Fowers, Jeremy ;

Ovtcharov, Kalin ;

Papamichael, Michael ;

Massengill, Todd ;

Liu, Ming ;

Lo, Daniel ;

Alkalay, Shlomi ;

Haselman, Michael ;

Adams, Logan ;

Ghandi, Mahdi ;

Heil, Stephen ;

Patel, Prerak ;

Sapek, Adam ;

Weisz, Gabriel ;

Woods, Lisa ;

Lanka, Sitaram ;

Reinhardt, Steven K. ;

Caulfield, Adrian M. ;

Chung, Eric S. ;

Burger, Doug .

2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, :1-14

[8]

Hegde K, 2018, 2018 51ST ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), P933, DOI [10.1109/MICR0.2018.00080, 10.1109/MICRO.2018.00080]

[9]

Intel, FOG REFERENCE UNIT

[10]

Intel, INTEL FPGA SDK OPEN

← 1 2 3 →