FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks

被引:279
作者
Blott, Michaela [1 ]
Preusser, Thomas B. [1 ]
Fraser, Nicholas J. [1 ]
Gambardella, Giulio [1 ]
O'Brien, Kenneth [1 ]
Umuroglu, Yaman [1 ]
Leeser, Miriam [2 ]
Vissers, Kees [3 ]
机构
[1] Xilinx Res, Xilinx Res Labs, 2020 Bianconi Ave,Citywest Business Campus, Dublin 24, Ireland
[2] Northeastern Univ, 316 Dana Res Ctr,360 Huntington Ave, Boston, MA 02115 USA
[3] Xilinx Res, 2100 Log Dr, San Jose, CA 95124 USA
基金
美国国家科学基金会; 欧盟地平线“2020”;
关键词
Neural network; artificial intelligence; FPGA; quantized neural networks; convolutional neural networks; FINN; inference; hardware accellerator; HARDWARE;
D O I
10.1145/3242897
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolutional Neural Networks have rapidly become the most successful machine-learning algorithm, enabling ubiquitous machine vision and intelligent decisions on even embedded computing systems. While the underlying arithmetic is structurally simple, compute and memory requirements are challenging. One of the promising opportunities is leveraging reduced-precision representations for inputs, activations, and model parameters. The resulting scalability in performance, power efficiency, and storage footprint provides interesting design compromises in exchange for a small reduction in accuracy. FPGAs are ideal for exploiting low-precision inference engines leveraging custom precisions to achieve the required numerical accuracy for a given application. In this article, we describe the second generation of the FINN framework, an end-to-end tool that enables design-space exploration and automates the creation of fully customized inference engines on FPGAs. Given a neural network description, the tool optimizes for given platforms, design targets, and a specific precision. We introduce formalizations of resource cost functions and performance predictions and elaborate on the optimization algorithms. Finally, we evaluate a selection of reduced precision neural networks ranging from CIFAR-10 classifiers to YOLO-based object detection on a range of platforms including PYNQ and AWS F1, demonstrating new unprecedented measured throughput at 50 TOp/s on AWS F1 and 5 TOp/s on embedded devices.
引用
收藏
页数:23
相关论文
共 68 条
[1]  
Abdelouahab K., 2017, IEEE Embedded Systems Letters, P1
[2]   Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing [J].
Albericio, Jorge ;
Judd, Patrick ;
Hetherington, Tayler ;
Aamodt, Tor ;
Jerger, Natalie Enright ;
Moshovos, Andreas .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :1-13
[3]  
Alemdar H., 2016, ABS160900222 CORR
[4]   YodaNN1 : An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights [J].
Andri, Renzo ;
Cavigelli, Lukas ;
Rossi, Davide ;
Benini, Luca .
2016 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI), 2016, :236-241
[5]  
[Anonymous], P FPL
[6]  
[Anonymous], ABS15110
[7]  
[Anonymous], 2016, P WORKSH COGN ARCH
[8]  
[Anonymous], CUSTOMIZING LOW PREC
[9]  
[Anonymous], P FPGA
[10]  
[Anonymous], 2016, ABS160304467 CORR