Low Bitwidth CNN Accelerator on FPGA Using Winograd and Block Floating Point Arithmetic

被引:5
作者
Wong, Yuk [1 ]
Dong, Zhenjiang [2 ]
Zhang, Wei [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept ECE, Hong Kong, Peoples R China
[2] HiSilicon, Res Dept, Shenzhen, Peoples R China
来源
2021 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2021) | 2021年
关键词
D O I
10.1109/ISVLSI51109.2021.00048
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolutional neural networks (CNNs) have achieved near or exceeding human performance in computer vision, yet their large computational and memory requirements made them difficult to deploy in both large data centers and embedded systems. CNNs are typically trained with floating point for accommodating large dynamic ranges of values during training. However, the large range of values comes with increased resource usage, power consumption, and latency. Fixed-point quantization is promising towards reducing the resource requirements of CNNs, yet low bitwidth implementations require fine-tuning to recover accuracy. In this work, we propose a CNN accelerator that utilizes block floating point (BFP) for reducing bitwidth to 10-bit and supporting Winograd filtering algorithm. We compare our design with a baseline FP16 design and show that our block floating point quantization reduces 50.1% LUTs, 48.3% Register, 27.3% BRAM and 43.8% DSP as well as achieves 32.1% higher frequency. Finally, we perform case studies with different CNNs and show that the accuracy drop is within 1% of the FP32 network.
引用
收藏
页码:218 / 223
页数:6
相关论文
共 15 条
[1]  
[Anonymous], 1980, ARITHMETIC COMPLEXIT
[2]  
Fernandez-Marques J., 2020, SEARCHING WINOGRAD A, P10711
[3]   FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates [J].
Guan, Yijin ;
Liang, Hao ;
Xu, Ningyi ;
Wang, Wenqiang ;
Shi, Shaoshuai ;
Chen, Xi ;
Sun, Guangyu ;
Zhang, Wei ;
Cong, Jason .
2017 IEEE 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2017), 2017, :152-159
[4]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[5]   High-Performance CNN Accelerator on FPGA Using Unified Winograd-GEMM Architecture [J].
Kala, S. ;
Jose, Babita R. ;
Mathew, Jimson ;
Nalesh, S. .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2019, 27 (12) :2816-2828
[6]   Optimized Fused Floating-Point Many-Term Dot-Product Hardware for Machine Learning Accelerators [J].
Kaul, Himanshu ;
Anders, Mark ;
Mathew, Sanu ;
Kim, Seongjong ;
Krishnamurthy, Ram .
2019 IEEE 26TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), 2019, :84-87
[7]   Fast Algorithms for Convolutional Neural Networks [J].
Lavin, Andrew ;
Gray, Scott .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4013-4021
[8]   High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic [J].
Lian, Xiaocong ;
Liu, Zhenyu ;
Song, Zhourui ;
Dai, Jiwu ;
Zhou, Wei ;
Ji, Xiangyang .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2019, 27 (08) :1874-1885
[9]   Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs [J].
Liang, Yun ;
Lu, Liqiang ;
Xiao, Qingcheng ;
Yan, Shengen .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (04) :857-870
[10]  
Liao H., 2019, IEEE HOT CHIPS 31 S, P1, DOI [10.1109/HOTCHIPS.2019.8875654, DOI 10.1109/HOTCHIPS.2019.8875654]