An FPGA-based bit-level weight sparsity and mixed-bit accelerator for neural networks☆

被引:0
作者
Hu, Xianghong [1 ,2 ]
Fu, Shansen [1 ,2 ]
Lin, Yuanmiao [1 ,2 ]
Li, Xueming [1 ,2 ]
Yang, Chaoming [1 ,2 ]
Li, Rongfeng [1 ,2 ]
Huang, Hongmin [1 ,2 ,3 ]
Cai, Shuting [1 ,2 ]
Xiong, Xiaoming [1 ,2 ]
机构
[1] Guangdong Univ Technol, Sch Integrated Circuits, Guangzhou 510006, Peoples R China
[2] Guangdong Univ Technol, Sch Automat, Guangzhou 510006, Peoples R China
[3] Guangdong Polytech Normal Univ, Sch Elect & Informat, Guangzhou 510665, Peoples R China
基金
中国国家自然科学基金;
关键词
Convolutional neural networks; Hardware accelerator; Mixed-bits quantization; Bit-level weight sparsity;
D O I
10.1016/j.sysarc.2025.103463
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Bit-level weight sparsity and mixed-bit quantization are regarded as effective methods to improve the computing efficiency of convolutional neural network (CNN) accelerators. However, irregular sparse matrices will greatly increase the index overhead and hardware resource consumption. Moreover, bit-serial computing (BSC) is usually adopted to implement bit-level weight sparsity on accelerators, and the traditional BSC leads to uneven utilization of DSP and LUT resources on the FPGA platform, thereby limiting the improvement of the overall performance of the accelerator. Therefore, in this work, we present an accelerator designed for bit-level weight sparsity and mixed-bit quantization. We first introduce a non-linear quantization algorithm named bit-level sparsity learned quantizer (BSLQ), which can maintain high accuracy during mixed quantization and guide the accelerator to complete bit-level weight sparse computations using DSP. Based on this algorithm, we implement the multi-channel bit-level sparsity (MCBS) method to mitigate irregularities and reduce the index count associated with bit level sparsity. Finally, we propose a sparse weight arbitrary basis scratch pad (SWAB SPad) method that enables retrieval of compressed weights without fetching activations, which can save 30.52% of LUTs and 64.02% of FFs. Experimental results demonstrate that when quantizing ResNet50 and VGG16 using 4/8 bits, our approach achieves accuracy that is comparable to or even better than 32-bit (75.98% and 73.70% for the two models). Compared to the state-of-the-art FPGA-based accelerators, this accelerator achieves up to 5.36 times DSP efficiency improvement and provides 8.87 times energy efficiency improvement.
引用
收藏
页数:12
相关论文
共 42 条
[1]   A Mixed-Pruning Based Framework for Embedded Convolutional Neural Network Acceleration [J].
Chang, Xuepeng ;
Pan, Huihui ;
Lin, Weiyang ;
Gao, Huijun .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2021, 68 (04) :1706-1715
[2]   Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices [J].
Chen, Yu-Hsin ;
Yange, Tien-Ju ;
Emer, Joel S. ;
Sze, Vivienne .
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (02) :292-308
[3]   Homogeneous teacher based buffer knowledge distillation for tiny neural networks [J].
Dai, Xinru ;
Lu, Gang ;
Shen, Jianhua ;
Huang, Shuo ;
Wei, Tongquan .
JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 148
[4]  
Gao Yunhung, 2022, 2022 IEEE 4th International Conference on Circuits and Systems (ICCS), P187, DOI 10.1109/ICCS56666.2022.9936588
[5]  
Gholami Amir, 2021, ARXIV210313630, DOI DOI 10.1201/9781003162810-13
[6]   SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks [J].
Gondimalla, Ashish ;
Chesnut, Noah ;
Thottethodi, Mithuna ;
Vijaykumar, T. N. .
MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2019, :151-165
[7]   Evaluating single event upsets in deep neural networks for semantic segmentation: An embedded system perspective [J].
Gutierrez-Zaballa, Jon ;
Basterretxea, Koldo ;
Echanobe, Javier .
JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 154
[8]   A Tiny Accelerator for Mixed-Bit Sparse CNN Based on Efficient Fetch Method of SIMO SPad [J].
Hu, Xianghong ;
Liu, Xuejiao ;
Liu, Yu ;
Zhang, Haowei ;
Huang, Xijie ;
Guan, Xihao ;
Liang, Luhong ;
Tsui, Chi Ying ;
Xiong, Xiaoming ;
Cheng, Kwang-Ting .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (08) :3079-3083
[9]   TiNNA: A Tiny Accelerator for Neural Networks With Efficient DSP Optimization [J].
Hu, Xianghong ;
Li, Xueming ;
Huang, Hongmin ;
Zheng, Xin ;
Xiong, Xiaoming .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (04) :2301-2305
[10]   A High Performance Multi-Bit-Width Booth Vector Systolic Accelerator for NAS Optimized Deep Learning Neural Networks [J].
Huang, Mingqiang ;
Liu, Yucen ;
Man, Changhai ;
Li, Kai ;
Cheng, Quan ;
Mao, Wei ;
Yu, Hao .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2022, 69 (09) :3619-3631