Multiplication-Free Lookup-Based CNN Accelerator Using Residual Vector Quantization and Its FPGA Implementation

被引：0

作者：

Fuketa, Hiroshi ^{[1
]}

Katashita, Toshihiro ^{[1
]}

Hori, Yohei ^{[1
]}

Hioki, Masakazu ^{[1
]}

机构：

[1] Natl Inst Adv Ind Sci & Technol, Tsukuba, Ibaraki 3058568, Japan

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Convolutional neural networks; Accuracy; Field programmable gate arrays; Vector quantization; Training; Task analysis; Associative memory; computing-in-memory; convolutional neural network; FPGA; vector quantization;

D O I：

10.1109/ACCESS.2024.3432979

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, a table lookup-based computing technique is proposed to perform convolutional neural network (CNN) inference without multiplication, and its FPGA implementation is demonstrated as a proof-of-concept. Conventionally, the hardware specific to the lookup-based dot product approximation (LDA) has been proposed to achieve energy-efficient CNN computations for edge AI applications. However, it has not been applied to complicated AI tasks, such as ImageNet image classification, because LDA degrades the inference accuracy, especially in complicated tasks. Therefore, a new LDA technique using residual vector quantization, called RLDA, is proposed in this study. By adopting the proposed RLDA to CNN, we can achieve an inference accuracy degradation of below 5% for ResNet-18 model on ImageNet 1000 classification. In addition, the proposed RLDA-based CNN accelerator is implemented on a ZCU104 evaluation board, which includes a Zynq UltraScale+ FPGA and DDR4 DRAM. We compare the processing time of the proposed accelerator on ZCU104 with that of NVIDIA Jetson AGX Orin and reveal that the processing time of the proposed accelerator is comparable to that of Jetson Orin in the lower layers of the ResNet-18 model. Finally, the computational performance of the proposed accelerator is compared with conventional FPGA-based accelerators. The proposed accelerator does not require a digital signal processor (DSP), which is often available in modern FPGAs. We demonstrate that the proposed accelerator achieves a performance more than four times higher than that of the conventional FPGA-based accelerator without DSP.

引用

页码：102470 / 102480

页数：11

共 26 条

[1] Power Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study With GoogLeNet [J].

Abd El-Maksoud, Ahmed J. ;

Ebbed, Mohamed ;

Khalil, Ahmed H. ;

Mostafa, Hassan .

IEEE ACCESS, 2021, 9 :151897-151911

[2]

Abouelhamayed AF, 2024, Arxiv, DOI arXiv:2305.18334

[3]

amd, ZCU104 Evaluation Board User Guide

[4]

Bengio Y, 2013, Arxiv, DOI [arXiv:1308.3432, DOI 10.48550/ARXIV.1308.3432]

[5]

Biing-Hwang Juang, 1982, Proceedings of ICASSP 82. IEEE International Conference on Acoustics, Speech and Signal Processing, P597

[6] Approximate Nearest Neighbor Search by Residual Vector Quantization [J].

Chen, Yongjian ;

Guan, Tao ;

Wang, Cheng .

SENSORS, 2010, 10 (12) :11259-11273

[7] EPSTO-ARIMA: Electric Power Stochastic Optimization Predicting Based on ARIMA [J].

Xu, Yuqing ;

Xu, Guangxia ;

An, Zeliang ;

Liu, Yanbin .

2021 IEEE 9TH INTERNATIONAL CONFERENCE ON SMART CITY AND INFORMATIZATION (ISCI 2021), 2021, :70-75

[8]

Fuketa H., 2023, IEEE Trans. Circuits Syst. I, Reg. Papers, V70, P1

[9] Edge Artificial Intelligence Chips for the Cyberphysical Systems Era [J].

Fuketa, Hiroshi ;

Uchiyama, Kunio .

COMPUTER, 2021, 54 (01) :84-88

[10]

Gray R. M., 1984, IEEE ASSP Magazine, V1, P4, DOI 10.1109/MASSP.1984.1162229

← 1 2 3 →