Pipeline ShiftAddNet: An FPGA-Based CNN Implementation With Low Hardware Consumption Targeting Constrained Devices

被引：0

作者：

Kiat, Wei-Pau ^{[1
]}

Lee, Wai Kong ^{[1
]}

Tan, Hung-Khoon ^{[1
]}

Ng, Hui-Fuang ^{[1
]}

机构：

[1] Univ Tunku Abdul Rahman, Kampar, Malaysia

来源：

INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS | 2025年

关键词：

artificial intelligence; CNN; deep learning; DNN; FPGA; multiplier-less;

D O I：

10.1002/cta.4419

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

ShiftAddNet is a recently proposed multiplier-less CNN that replaces conventional multiplication with cheaper shift and add operations, which makes it suitable for hardware implementation. In this paper, we present the first implementation of ShiftAddNet FPGA inference core, which achieves low area consumption and fast computation. ShiftAddNet combined the convolutional layer of the DeepShift-PS (denoted as Shift-Accumulate, sac) and AdderNet (denoted as Add-Accumulate, aac) into a single computational stage. Due to this reason, there are data dependencies between the sac and aac, which prohibits them from being executed in parallel, resulting in 2x$$ 2\times $$ more operations compared to other multiplier-less CNNs like DeepShift-PS and AdderNet. To overcome this performance bottleneck, we proposed a novel technique to allow pipeline processing between sac and aac, effectively reducing the latency. The proposed ShiftAddNet-18 was evaluated on a small ResNet-18, achieving 11.37 ms of latency per image, which is similar to$$ \sim $$69.21% faster than the original version that takes 19.24 ms. On a denser network, the proposed pipeline ShiftAddNet-101 requires only 61.92 ms as compared to the original version of 98.85 ms, showing a latency reduction of similar to$$ \sim $$37.1%. Compared to the state-of-the-art multiplier-less CNN core (e.g., AdderNet), our work is 20% slower in latency but provides higher accuracy and consumes 2.2x$$ 2.2\times $$ less DSP.

引用

页数：10

共 17 条

[1]

Chen H., 2020, P IEEECVF C COMPUTER, V1468, P1477

[2]

Elhoushi M., 2021, P IEEECVF C COMPUTER, V2359, P2368

[3] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[4]

Ioffe S, 2015, 32 INT C MACHINE LEA, V37, P448

[5] SensorNet: A Scalable and Low-Power Deep Convolutional Neural Network for Multimodal Data Classification [J].

Jafari, Ali ;

Ganesan, Ashwinkumar ;

Thalisetty, Chetan Sai Kumar ;

Sivasubramanian, Varun ;

Oates, Tim ;

Mohsenin, Tinoosh .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2019, 66 (01) :274-287

[6] FPGA based real-time inference machine for A-mode ultrasonic echo pattern recognition [J].

Lee, Young-Chan ;

Um, Ji-Yong .

ELECTRONICS LETTERS, 2023, 59 (11)

[7]

Li XL, 2023, Arxiv, DOI arXiv:2208.09708

[8]

Lu L., 2019, 2019 IEEE 27 ANN INT, V17, P25

[9]

Mengshu Sun, 2022, FPGA '22: Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, P134, DOI 10.1145/3490422.3502364

[10] Real-time video fire/smoke detection based on CNN in antifire surveillance systems [J].

Saponara, Sergio ;

Elhanashi, Abdussalam ;

Gagliardi, Alessio .

JOURNAL OF REAL-TIME IMAGE PROCESSING, 2021, 18 (03) :889-900

← 1 2 →