Low-Complexity Classification Technique and Hardware-Efficient Classify-Unit Architecture for CNN Accelerator

被引：2

作者：

Islam, Md Najrul ^{[1
]}

Shrestha, Rahul ^{[1
]}

Chowdhury, Shubhajit Roy ^{[1
]}

机构：

[1] Indian Inst Technol Mandi, Sch Comp & Elect Engn, Mandi, Himachal Prades, India

来源：

PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON VLSI DESIGN, VLSID 2024 AND 23RD INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, ES 2024 | 2024年

关键词：

Convolutional neural network (CNN); very large scale integration (VLSI); digital VLSI-architecture design; field programmable gate array (FPGA); fully depleted silicon on insulator (FDSOI); and application specific integrated circuit (ASIC);

D O I：

10.1109/VLSID60093.2024.00041

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper proposes simplified classification technique to reduce the complexity of softmax-based classification in the convolutional neural network (CNN) inference engine/accelerator. It primarily allows the CNN accelerator to directly classify the object from the activation of fully connected (FC) layer that avoids complex exponential and divisive operations. Corresponding to the suggested technique, this work also presents a hardware-efficient VLSI architecture of classify unit for CNN accelerator. Furthermore, the proposed classify-unit architecture has been ASIC synthesized and post-layout simulated in 28 nm-FDSOI technology node. As a result, our design delivers a peak throughput of 2.5 GIPS with a hardware efficiency of 5.05x10(3) GIPS/mW/mm(2). Comparison of these results with the relevant reported works indicates that the proposed classify unit manifests 24.1x lesser area and 12.5x better hardware efficiency than the state-of-the-art implementations. Finally, complete CNN accelerator that is integrated with the proposed classify unit has been functionally validated with the aid of Zynq UltraScale+ ZCU102 FPGA-board in real-world scenario, using the MobileNet-V2 CNN model.

引用

页码：210 / 215

页数：6

共 50 条

[31] Highly efficient architecture of newhope-nist on fpga using low-complexity ntt/intt
Zhang N.
Yang B.
Chen C.
Yin S.
Wei S.
Liu L.
IACR Transactions on Cryptographic Hardware and Embedded Systems, 2020, 2020 (02): : 49 - 72
[32] Low-complexity, energy-efficient fully parallel split-radix FFT architecture
Hazarika, Jinti
Ahamed, Shaik Rafi
Nemade, Harshal B.
ELECTRONICS LETTERS, 2022, 58 (18) : 678 - 680
[33] Hardware-Efficient and Low Sensing-Time VLSI-Architecture of MED based Spectrum Sensor for Cognitive Radio
Chaurasiya, Rohit B.
Shrestha, Rahul
2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2019,
[34] Prototype of Low Complexity CNN Hardware Accelerator with FPGA-based PYNQ Platform for Dual-Mode Biometrics Recognition
Chen, Yu-Hsiang
Fan, Chih-Peng
Chang, Robert Chen-Hao
2020 17TH INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC 2020), 2020, : 189 - 190
[35] ARBiS: A Hardware-Efficient SRAM CIM CNN Accelerator With Cyclic-Shift Weight Duplication and Parasitic-Capacitance Charge Sharing for AI Edge Application
Zhao, Chenyang
Fang, Jinbei
Jiang, Jingwen
Xue, Xiaoyong
Zeng, Xiaoyang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2023, 70 (01) : 364 - 377
[36] A low-complexity demodulation technique for spectrally efficient FDM systems using decision-feedback
Yu, Baoxian
Zhang, Han
Dai, Xianhua
IET COMMUNICATIONS, 2017, 11 (15) : 2386 - 2392
[37] Efficient Combining Technique with A Low-Complexity Detect-and-Forward Relay for Cooperative Diversity Scheme
Abd Aziz, Azlan
Iwanami, Yasunori
Okamoto, Eiji
TENCON 2009 - 2009 IEEE REGION 10 CONFERENCE, VOLS 1-4, 2009, : 123 - 128
[38] SCENIC: An Area and Energy-Efficient CNN-based Hardware Accelerator for Discernable Classification of Brain Pathologies using MRI
Naidu, Bodepu Sai Tirumala
Biswas, Shreya
Chatterjee, Rounak
Mandal, Sayak
Pratihar, Srijan
Chatterjee, Ayan
Raha, Arnab
Mukherjee, Amitava
Paluh, Janet
2022 35TH INTERNATIONAL CONFERENCE ON VLSI DESIGN (VLSID 2022) HELD CONCURRENTLY WITH 2022 21ST INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (ES 2022), 2022, : 168 - 173
[39] Low-complexity Machine Learning Architecture for Hardware-aware True Random Number Generators Assessment and Continuous Monitoring
Spinelli, F.
Moretti, R.
Addabbo, T.
Vitolo, P.
Licciardo, G. D.
2023 18TH CONFERENCE ON PH.D RESEARCH IN MICROELECTRONICS AND ELECTRONICS, PRIME, 2023, : 221 - 224
[40] Selective Gray-Coded Bit-Plane Based Low-Complexity Motion Estimation and its Hardware Architecture
Yavuz, Seda
Celebi, Anil
Aslam, Muhammad
Urhan, Oguzhan
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2016, 62 (01) : 76 - 84

← 1 2 3 4 5 →