A 119.64 GOPs/W FPGA-Based ResNet50 Mixed-Precision Accelerator Using the Dynamic DSP Packing

被引：2

作者：

Ou, Yaozhong ^{[1
]}

Yu, Wei-Han ^{[1
]}

Un, Ka-Fai ^{[1
]}

Chan, Chi-Hang ^{[1
]}

Zhu, Yan ^{[1
]}

机构：

[1] Univ Macau, State Key Lab Analog & Mixed Signal VLSI, Macau, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS | 2024年 / 71卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Quantization (signal); Sensitivity; Throughput; Bandwidth; Computational modeling; Clocks; Convolutional neural network (CNN); mixed-precision quantization; field programmable gate array (FPGA); digital signal processor (DSP); image classification; CONVOLUTIONAL NEURAL-NETWORKS;

D O I：

10.1109/TCSII.2024.3377356

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This brief presents a precision-sensitivity-aware quantization (PSAQ) mixed precision (MP) compression scheme designed for both weights and activations. The PSAQ MP method achieves a better trade-off between accuracy and energy efficiency, maintaining 75.6% top-1 accuracy in ResNet-50 and achieving 2.06 x reduction in normalized operation with less than 1% accuracy difference compared to baseline. We propose two DSP-pipeline-friendly methods, dynamic DSP packing (DDP) and fully pre-calibrated (FPC) unpacking, to pack multiple operations into single DSP in error-free style with only one more clock cycle and slight logic overhead compared to the one without packing, by which the accelerator can simultaneously address the support for MP algorithms and efficient utilization of DSP bandwidth. Cooperated by the router network and optimized dataflow, our MP accelerator achieves 330.15 GOP/s throughput and 119.64 GOPs/W energy efficiency under 2.27-b weight and 3.61-b input feature map (ifmap).

引用

页码：2554 / 2558

页数：5

共 17 条

[1] CNN Variants for Computer Vision: History, Architecture, Application, Challenges and Future Scope [J].

Bhatt, Dulari ;

Patel, Chirag ;

Talsania, Hardik ;

Patel, Jigar ;

Vaghela, Rasmika ;

Pandya, Sharnil ;

Modi, Kirit ;

Ghayvat, Hemant .

ELECTRONICS, 2021, 10 (20)

[2]

Chen XC, 2022, LECT NOTES COMPUT SC, V13436, P46, DOI [10.1145/3549843.3549850, 10.1007/978-3-031-16446-0_5]

[3]

Choi J, 2018, Arxiv, DOI arXiv:1805.06085

[4] A 50.4 GOPs/W FPGA-Based MobileNetV2 Accelerator using the Double-Layer MAC and DSP Efficiency Enhancement [J].

Li, Jixuan ;

Chen, Jiabao ;

Un, Ka-Fai ;

Yu, Wei-Han ;

Mak, Pui-In ;

Martins, Rui P. .

IEEE ASIAN SOLID-STATE CIRCUITS CONFERENCE (A-SSCC 2021), 2021,

[5] RTM3D: Real-Time Monocular 3D Detection from Object Keypoints for Autonomous Driving [J].

Li, Peixuan ;

Zhao, Huaici ;

Liu, Pengfei ;

Cao, Feidao .

COMPUTER VISION - ECCV 2020, PT III, 2020, 12348 :644-660

[6] A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects [J].

Li, Zewen ;

Liu, Fan ;

Yang, Wenjie ;

Peng, Shouheng ;

Zhou, Jun .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (12) :6999-7019

[7]

Mengshu Sun, 2022, FPGA '22: Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, P134, DOI 10.1145/3490422.3502364

[8] MobileNetV2: Inverted Residuals and Linear Bottlenecks [J].

Sandler, Mark ;

Howard, Andrew ;

Zhu, Menglong ;

Zhmoginov, Andrey ;

Chen, Liang-Chieh .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4510-4520

[9] DSP-Packing: Squeezing Low-precision Arithmetic into FPGA DSP Blocks [J].

Sommer, Jan ;

Oezkan, M. Akif ;

Keszocze, Oliver ;

Teich, Juergen .

2022 32ND INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL, 2022, :160-166

[10] COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images [J].

Wang, Linda ;

Lin, Zhong Qiu ;

Wong, Alexander .

SCIENTIFIC REPORTS, 2020, 10 (01)

← 1 2 →