Accelerating Sparse Convolutional Neural Networks with Systolic Arrays on FPGA

被引：1

作者：

Nehete, Hemkant ^{[1
]}

Verma, Gaurav ^{[1
]}

Yadav, Shailendra ^{[1
]}

Kaushik, Brajesh Kumar ^{[1
]}

机构：

[1] Indian Inst Technol Roorkee, Dept Elect & Commun Engn, Roorkee 247667, Uttar Pradesh, India

来源：

APPLICATIONS OF MACHINE LEARNING 2023 | 2023年 / 12675卷

关键词：

Sparse CNN Accelerator; FPGA; Compressed Sparse Row (CSR); Systolic array;

D O I：

10.1117/12.2676783

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutional Neural Networks (CNNs) are frequently used in a wide range of applications, including speech, image recognition and natural language processing. However, due to the computational complexity of CNNs, deploying these networks on resource-limited edge devices has become a significant challenge. Sparse CNNs use the sparsity in the weight matrices of the networks to minimize computations while maintaining accuracy. By storing only the nonzero values, the Compressed Sparse Row (CSR) format compresses the sparse matrix, lowering the memory requirement and computational complexity of the network. This work presents a novel approach for accelerating Sparse CNNs on Field-Programmable Gate Arrays (FPGAs) using the CSR format and systolic arrays. The proposed method takes advantage of systolic arrays' parallel processing capabilities to perform CSR-based sparse convolutions. Furthermore, an algorithm has been presented that optimizes the data layout to maximize data reuse and minimize data movement between different processing elements of the systolic array and external memory. The architecture is evaluated and compared to a state-of-the-art GPU implementation on several benchmark datasets. The proposed architecture outperformed the GPU-based implementation in terms of throughput and power efficiency by 1.42x and 22.4x, respectively. The presented approach provides a promising solution for accelerating Sparse CNNs on resourceconstrained devices and enabling the deployment of these networks in a variety of applications.

引用

页数：8

共 12 条

[1] ERIDANUS: Efficiently Running Inference of DNNs Using Systolic Arrays [J].

Asgari, Bahar ;

Hadidi, Ramyad ;

Kim, Hyesoon ;

Yalamanehili, Sudhakar .

IEEE MICRO, 2019, 39 (05) :46-54

[2] Perceptual Enhancement for Autonomous Vehicles: Restoring Visually Degraded Images for Context Prediction via Adversarial Training [J].

Ding, Feng ;

Yu, Keping ;

Gu, Zonghua ;

Li, Xiangjun ;

Shi, Yunqing .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (07) :9430-9441

[3]

Kamthan S, 2022, Memories - Materials Devices Circuits and Systems, V2, P100016, DOI [10.1016/j.memori.2022.100016, DOI 10.1016/J.MEMORI.2022.100016, 10.1016/j.memori.2022.100016]

[4] Pruning and quantization for deep neural network acceleration: A survey [J].

Liang, Tailin ;

Glossner, John ;

Wang, Lei ;

Shi, Shaobo ;

Zhang, Xiaotong .

NEUROCOMPUTING, 2021, 461 :370-403

[5] OMNI: A Framework for Integrating Hardware and Software Optimizations for Sparse CNNs [J].

Liang, Yun ;

Lu, Liqiang ;

Xie, Jiaming .

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2021, 40 (08) :1648-1661

[6] An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs [J].

Lu, Liqiang ;

Xie, Jiaming ;

Huang, Ruirui ;

Zhang, Jiansong ;

Lin, Wei ;

Liang, Yun .

2019 27TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2019, :17-25

[7] A survey of FPGA-based accelerators for convolutional neural networks [J].

Mittal, Sparsh .

NEURAL COMPUTING & APPLICATIONS, 2020, 32 (04) :1109-1139

[8]

Natsui M, 2023, Memories - Materials Devices Circuits and Systems, V4, P100035, DOI [10.1016/j.memori.2023.100035, 10.1016/j.memori.2023.100035, DOI 10.1016/J.MEMORI.2023.100035]

[9] FPGA Based CNN Accelerator for High-Speed Biomedical Application [J].

Nehete, Hemkant ;

Verma, Gaurav ;

Gupta, Avi ;

Kaushik, Partha ;

Kaushik, Brajesh Kumar .

HIGH-SPEED BIOMEDICAL IMAGING AND SPECTROSCOPY VIII, 2023, 12390

[10] An Efficient Hardware Accelerator for Block Sparse Convolutional Neural Networks on FPGA [J].

Yin, Xiaodi ;

Wu, Zhipeng ;

Li, Dejian ;

Shen, Chongfei ;

Liu, Yu .

IEEE EMBEDDED SYSTEMS LETTERS, 2024, 16 (02) :158-161

← 1 2 →