WRA-MF: A Bit-Level Convolutional-Weight-Decomposition Approach to Improve Parallel Computing Efficiency for Winograd-Based CNN Acceleration

被引：0

作者：

Xiang, Siwei ^{[1
]}

Lv, Xianxian ^{[1
]}

Meng, Yishuo ^{[1
]}

Wang, Jianfei ^{[1
]}

Lu, Cimang ^{[2
]}

Yang, Chen ^{[1
]}

机构：

[1] Xi An Jiao Tong Univ, Sch Microelect, Xian 710049, Peoples R China

[2] Shenzhen Xinrai Sinovoice Technol Co Ltd, Shenzhen 518000, Peoples R China

来源：

ELECTRONICS | 2023年 / 12卷 / 24期

基金：

中国国家自然科学基金;

关键词：

convolutional neural networks; acceleration algorithm; convolution weight decomposition; multiplication reduction; hardware efficiency; SYSTEM;

D O I：

10.3390/electronics12244943

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

FPGA-based convolutional neural network (CNN) accelerators have been extensively studied recently. To exploit the parallelism of multiplier-accumulator computation in convolution, most FPGA-based CNN accelerators heavily depend on the number of on-chip DSP blocks in the FPGA. Consequently, the performance of the accelerators is restricted by the limitation of the DSPs, leading to an imbalance in the utilization of other FPGA resources. This work proposes a multiplication-free convolutional acceleration scheme (named WRA-MF) to relax the pressure on the required DSP resources. Firstly, the proposed WRA-MF employs the Winograd algorithm to reduce the computational density, and it then performs bit-level convolutional weight decomposition to eliminate the multiplication operations. Furthermore, by extracting common factors, the complexity of the addition operations is reduced. Experimental results on the Xilinx XCVU9P platform show that the WRA-MF can achieve 7559 GOP/s throughput at a 509 MHz clock frequency for VGG16. Compared with state-of-the-art works, the WRA-MF achieves up to a 3.47x-27.55x area efficiency improvement. The results indicate that the proposed architecture achieves a high area efficiency while ameliorating the imbalance in the resource utilization.

引用

页数：17