Low-Complexity Classification Technique and Hardware-Efficient Classify-Unit Architecture for CNN Accelerator

被引:2
|
作者
Islam, Md Najrul [1 ]
Shrestha, Rahul [1 ]
Chowdhury, Shubhajit Roy [1 ]
机构
[1] Indian Inst Technol Mandi, Sch Comp & Elect Engn, Mandi, Himachal Prades, India
关键词
Convolutional neural network (CNN); very large scale integration (VLSI); digital VLSI-architecture design; field programmable gate array (FPGA); fully depleted silicon on insulator (FDSOI); and application specific integrated circuit (ASIC);
D O I
10.1109/VLSID60093.2024.00041
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes simplified classification technique to reduce the complexity of softmax-based classification in the convolutional neural network (CNN) inference engine/accelerator. It primarily allows the CNN accelerator to directly classify the object from the activation of fully connected (FC) layer that avoids complex exponential and divisive operations. Corresponding to the suggested technique, this work also presents a hardware-efficient VLSI architecture of classify unit for CNN accelerator. Furthermore, the proposed classify-unit architecture has been ASIC synthesized and post-layout simulated in 28 nm-FDSOI technology node. As a result, our design delivers a peak throughput of 2.5 GIPS with a hardware efficiency of 5.05x10(3) GIPS/mW/mm(2). Comparison of these results with the relevant reported works indicates that the proposed classify unit manifests 24.1x lesser area and 12.5x better hardware efficiency than the state-of-the-art implementations. Finally, complete CNN accelerator that is integrated with the proposed classify unit has been functionally validated with the aid of Zynq UltraScale+ ZCU102 FPGA-board in real-world scenario, using the MobileNet-V2 CNN model.
引用
收藏
页码:210 / 215
页数:6
相关论文
共 50 条
  • [1] The Design of Efficient Data Flow and Low-Complexity Architecture for a Highly Configurable CNN Accelerator
    Liu, Hui-Wen
    Shen, Chung-An
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (08) : 4759 - 4783
  • [2] The Design of Efficient Data Flow and Low-Complexity Architecture for a Highly Configurable CNN Accelerator
    Hui-Wen Liu
    Chung-An Shen
    Circuits, Systems, and Signal Processing, 2023, 42 : 4759 - 4783
  • [3] A New Hardware-Efficient VLSI-Architecture of GoogLeNet CNN-Model Based Hardware Accelerator for Edge Computing Applications
    Islam, Md. Najrul
    Shrestha, Rahul
    Chowdhury, Shubhajit Roy
    2022 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2022), 2022, : 414 - 417
  • [4] An Efficient CNN Architecture for Image Classification on FPGA Accelerator
    Mujawar, Shahmustafa
    Kiran, Divya
    Ramasangu, Hariharan
    2018 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRONICS, COMPUTERS AND COMMUNICATIONS (ICAECC), 2018,
  • [5] A novel algorithm and hardware architecture for low-complexity soft demappers
    Kalampoukas, Andreas
    Paliouras, Vassilis
    2018 7TH INTERNATIONAL CONFERENCE ON MODERN CIRCUITS AND SYSTEMS TECHNOLOGIES (MOCAST), 2018,
  • [6] A Low-Complexity Noise Removal Technique and its Hardware Implementation
    Matsubara, Takeaki
    Moshnyaga, Vasily G.
    Hashimoto, Koji
    TENCON 2010: 2010 IEEE REGION 10 CONFERENCE, 2010, : 716 - 719
  • [7] A configurable hardware-efficient ECG classification inference engine based on CNN for mobile healthcare applications
    Zhang, Chen
    Li, Jian
    Guo, Pengfei
    Li, Qiuping
    Zhang, Xing
    Wang, Xinan
    MICROELECTRONICS JOURNAL, 2023, 141
  • [8] A Low-Latency and Low-Complexity Hardware Architecture for CTC Beam Search Decoding
    Lu, Siyuan
    Lu, Jinming
    Lin, Jun
    Wang, Zhongfeng
    Du, Li
    PROCEEDINGS OF THE 2019 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2019), 2019, : 352 - 357
  • [9] Hardware-efficient algorithm and architecture design with memory and complexity reduction for semi-global matching
    Chang, Cheng-Tsung
    Chen, Pin-Wei
    Chin, Wen-Long
    Chou, Shih-Hsiang
    Yang, Yu-Hua
    INTEGRATION-THE VLSI JOURNAL, 2023, 92 : 99 - 105
  • [10] New efficient low-complexity architecture for performing inversion and divisions
    Liu, CH
    2001 INTERNATIONAL SYMPOSIUM ON VLSI TECHNOLOGY, SYSTEMS, AND APPLICATIONS, PROCEEDINGS OF TECHNICAL PAPERS, 2001, : 299 - 302