POLYBiNN: Binary Inference Engine for Neural Networks using Decision Trees

被引:2
作者
Abdelsalam, Ahmed M. [1 ]
Elsheikh, Ahmed [2 ]
Chidambaram, Sivakumar [3 ]
David, Jean-Pierre [3 ]
Langlois, J. M. Pierre [1 ]
机构
[1] Polytech Montreal, Dept Comp & Software Engn, Montreal, PQ, Canada
[2] Polytech Montreal, Dept Math & Ind Engn, Montreal, PQ, Canada
[3] Polytech Montreal, Dept Elect Engn, Montreal, PQ, Canada
来源
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2020年 / 92卷 / 01期
关键词
Deep learning; FPGAs; Decision trees; Hardware accelerators; Binary classifiers;
D O I
10.1007/s11265-019-01453-w
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Convolutional Neural Networks (CNNs) and Deep Neural Networks (DNNs) have gained significant popularity in several classification and regression applications. The massive computation and memory requirements of DNN and CNN architectures pose particular challenges for their FPGA implementation. Moreover, programming FPGAs requires hardware-specific knowledge that many machine-learning researchers do not possess. To make the power and versatility of FPGAs available to a wider deep learning user community and to improve DNN design efficiency, we introduce POLYBiNN, an efficient FPGA-based inference engine for DNNs and CNNs. POLYBiNN is composed of a stack of decision trees, which are binary classifiers in nature, and it utilizes AND-OR gates instead of multipliers and accumulators. POLYBiNN is a memory-free inference engine that drastically cuts hardware costs. We also propose a tool for the automatic generation of a low-level hardware description of the trained POLYBiNN for a given application. We evaluate POLYBiNN and the tool for several datasets that are normally solved using fully connected layers. On the MNIST dataset, when implemented in a ZYNQ-7000 ZC706 FPGA, the system achieves a throughput of up to 100 million image classifications per second with 90 ns latency and 97.26% accuracy. Moreover, POLYBiNN consumes 8x less power than the best previously published implementations, and it does not require any memory access. We also show how POLYBiNN can be used instead of the fully connected layers of a CNN and apply this approach to the CIFAR-10 dataset.
引用
收藏
页码:95 / 107
页数:13
相关论文
共 33 条
[1]  
Abdelsalam A.M., 2018, IEEE DESIGN ARCHITEC
[2]  
Abdelsalam AM, 2018, PROC INT CONF RECON
[3]  
Akers S. B., 1978, IEEE T COMPUTERS
[4]  
Alemdar H., 2017, IEEE INT JOINT C NEU
[5]  
[Anonymous], 2008, The Elements of Statistical Learning
[6]  
[Anonymous], 2017, ACM SIGDA INT S FIEL
[7]  
[Anonymous], 2015, NATURE
[8]  
[Anonymous], 2015, ADV NEURAL INFORM PR
[9]  
[Anonymous], 2017, ARXIV170509283
[10]  
[Anonymous], Gradient-based learning applied to document recognition