Toward Multi-FPGA Acceleration of the Neural Networks

被引:25
作者
Biookaghazadeh, Saman [1 ]
Ravi, Pravin Kumar [1 ]
Zhao, Ming [1 ]
机构
[1] Arizona State Univ, Sch Comp Informat & Decis Syst Engn, 699 S Mill Ave, Tempe, AZ 85281 USA
基金
美国国家科学基金会;
关键词
FPGA; neural networks; distributed systems;
D O I
10.1145/3432816
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
High-throughput and low-latency Convolutional Neural Network (CNN) inference is increasingly important for many cloud- and edge-computing applications. FPGA-based acceleration of CNN inference has demonstrated various benefits compared to other high-performance devices such as GPGPUs. Current FPGA CNN-acceleration solutions are based on a single FPGA design, which are limited by the available resources on an FPGA. In addition, they can only accelerate conventional 2D neural networks. To address these limitations, we present a generic multi-FPGA solution, written in OpenCL, which can accelerate more complex CNNs (e.g., C3D CNN) and achieve a near linear speedup with respect to the available single-FPGA solutions. The design is built upon the Intel Deep Learning Accelerator architecture, with three extensions. First, it includes updates for better area efficiency (up to 25%) and higher performance (up to 24%). Second, it supports 3D convolutions for more challenging applications such as video learning. Third, it supports multi-FPGA communication for higher inference throughput. The results show that utilizing multiple FPGAs can linearly increase the overall bandwidth while maintaining the same end-to-end latency. In addition, the design can outperform other FPGA 2D accelerators by up to 8.4 times and 3D accelerators by up to 1.7 times.
引用
收藏
页数:23
相关论文
共 28 条
[1]  
Abadi M, 2016, ACM SIGPLAN NOTICES, V51, P1, DOI [10.1145/3022670.2976746, 10.1145/2951913.2976746]
[2]   Deep Reinforcement Learning A brief survey [J].
Arulkumaran, Kai ;
Deisenroth, Marc Peter ;
Brundage, Miles ;
Bharath, Anil Anthony .
IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38
[3]   An OpenCLTM Deep Learning Accelerator on Arria 10 [J].
Aydonat, Utku ;
O'Connell, Shane ;
Capalija, Davor ;
Ling, Andrew C. ;
Chiu, Gordon R. .
FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, :55-64
[4]  
Biookaghazadeh S, 2018, P USENIX WORKSH HOT, P1
[5]   You Cannot Improve What You Do not Measure: FPGA vs. ASIC Efficiency Gaps for Convolutional Neural Network Inference [J].
Boutros, Andrew ;
Yazdanshenas, Sadegh ;
Betz, Vaughn .
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2018, 11 (03)
[6]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[7]   A Configurable Cloud-Scale DNN Processor for Real-Time AI [J].
Fowers, Jeremy ;
Ovtcharov, Kalin ;
Papamichael, Michael ;
Massengill, Todd ;
Liu, Ming ;
Lo, Daniel ;
Alkalay, Shlomi ;
Haselman, Michael ;
Adams, Logan ;
Ghandi, Mahdi ;
Heil, Stephen ;
Patel, Prerak ;
Sapek, Adam ;
Weisz, Gabriel ;
Woods, Lisa ;
Lanka, Sitaram ;
Reinhardt, Steven K. ;
Caulfield, Adrian M. ;
Chung, Eric S. ;
Burger, Doug .
2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, :1-14
[8]  
Hegde K, 2018, 2018 51ST ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), P933, DOI [10.1109/MICR0.2018.00080, 10.1109/MICRO.2018.00080]
[9]  
Intel, FOG REFERENCE UNIT
[10]  
Intel, INTEL FPGA SDK OPEN