An FPGA-based bit-level weight sparsity and mixed-bit accelerator for neural networks☆

被引:0
作者
Hu, Xianghong [1 ,2 ]
Fu, Shansen [1 ,2 ]
Lin, Yuanmiao [1 ,2 ]
Li, Xueming [1 ,2 ]
Yang, Chaoming [1 ,2 ]
Li, Rongfeng [1 ,2 ]
Huang, Hongmin [1 ,2 ,3 ]
Cai, Shuting [1 ,2 ]
Xiong, Xiaoming [1 ,2 ]
机构
[1] Guangdong Univ Technol, Sch Integrated Circuits, Guangzhou 510006, Peoples R China
[2] Guangdong Univ Technol, Sch Automat, Guangzhou 510006, Peoples R China
[3] Guangdong Polytech Normal Univ, Sch Elect & Informat, Guangzhou 510665, Peoples R China
基金
中国国家自然科学基金;
关键词
Convolutional neural networks; Hardware accelerator; Mixed-bits quantization; Bit-level weight sparsity;
D O I
10.1016/j.sysarc.2025.103463
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Bit-level weight sparsity and mixed-bit quantization are regarded as effective methods to improve the computing efficiency of convolutional neural network (CNN) accelerators. However, irregular sparse matrices will greatly increase the index overhead and hardware resource consumption. Moreover, bit-serial computing (BSC) is usually adopted to implement bit-level weight sparsity on accelerators, and the traditional BSC leads to uneven utilization of DSP and LUT resources on the FPGA platform, thereby limiting the improvement of the overall performance of the accelerator. Therefore, in this work, we present an accelerator designed for bit-level weight sparsity and mixed-bit quantization. We first introduce a non-linear quantization algorithm named bit-level sparsity learned quantizer (BSLQ), which can maintain high accuracy during mixed quantization and guide the accelerator to complete bit-level weight sparse computations using DSP. Based on this algorithm, we implement the multi-channel bit-level sparsity (MCBS) method to mitigate irregularities and reduce the index count associated with bit level sparsity. Finally, we propose a sparse weight arbitrary basis scratch pad (SWAB SPad) method that enables retrieval of compressed weights without fetching activations, which can save 30.52% of LUTs and 64.02% of FFs. Experimental results demonstrate that when quantizing ResNet50 and VGG16 using 4/8 bits, our approach achieves accuracy that is comparable to or even better than 32-bit (75.98% and 73.70% for the two models). Compared to the state-of-the-art FPGA-based accelerators, this accelerator achieves up to 5.36 times DSP efficiency improvement and provides 8.87 times energy efficiency improvement.
引用
收藏
页数:12
相关论文
共 42 条
[31]   Edge-Side Fine-Grained Sparse CNN Accelerator With Efficient Dynamic Pruning Scheme [J].
Wu, Bi ;
Yu, Tianyang ;
Chen, Ke ;
Liu, Weiqiang .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (03) :1285-1298
[32]  
Xiao Guangxuan, P MACHINE LEARNING R
[33]  
Yang C., 2023, IEEE Trans. Very Large Scale Integr. (VLSI) Syst.
[34]   A Sparse CNN Accelerator for Eliminating Redundant Computations in Intra- and Inter-Convolutional/Pooling Layers [J].
Yang, Chen ;
Meng, Yishuo ;
Huo, Kaibo ;
Xi, Jiawei ;
Mei, Kuizhi .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2022, 30 (12) :1902-1915
[35]   Post-training quantization for re-parameterization via coarse & fine weight splitting [J].
Yang, Dawei ;
He, Ning ;
Hu, Xing ;
Yuan, Zhihang ;
Yu, Jiangyong ;
Xu, Chen ;
Jiang, Zhe .
JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 147
[36]   FuseKNA: Fused Kernel Convolution based Accelerator for Deep Neural Networks [J].
Yang, Jianxun ;
Zhang, Zhao ;
Liu, Zhuangzhi ;
Zhou, Jing ;
Liu, Leibo ;
Wei, Shaojun ;
Yin, Shouyi .
2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, :894-907
[37]   BitSystolic: A 26.7 TOPS/W 2b∼8b NPU With Configurable Data Flows for Edge Devices [J].
Yang, Qing ;
Li, Hai .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2021, 68 (03) :1134-1145
[38]   ISOSceles: Accelerating Sparse CNNs through Inter-Layer Pipelining [J].
Yang, Yifan ;
Emer, Joel S. ;
Sanchez, Daniel .
2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, :598-610
[39]   LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks [J].
Zhang, Dongqing ;
Yang, Jiaolong ;
Ye, Dongqiangzi ;
Hua, Gang .
COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 :373-390
[40]  
Zhao P., 2023, IEEE Trans. Circuits Syst. II: Express Briefs