Hardware Accelerator Design for Sparse DNN Inference and Training: A Tutorial

被引:3
|
作者
Mao, Wendong [1 ]
Wang, Meiqi [1 ]
Xie, Xiaoru [2 ]
Wu, Xiao [2 ]
Wang, Zhongfeng [1 ]
机构
[1] Sun Yat Sen Univ, Sch Integrated Circuits, Shenzhen Campus, Shenzhen 518107, Guangdong, Peoples R China
[2] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210008, Peoples R China
关键词
Hardware acceleration; sparsity; CNN; transformer; tutorial; deep learning; FLEXIBLE ACCELERATOR; NEURAL-NETWORKS; EFFICIENT; ARCHITECTURE;
D O I
10.1109/TCSII.2023.3344681
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep neural networks (DNNs) are widely used in many fields, such as artificial intelligence generated content (AIGC) and robotics. To efficiently support these tasks, the model pruning technique is developed to compress the computational and memory-intensive DNNs. However, directly executing these sparse models on a common hardware accelerator can cause significant under-utilization, since invalid data resulting from the sparse patterns leads to unnecessary computations and irregular memory accesses. This brief analyzes the critical issues in accelerating sparse models, and provides an overview of typical hardware designs for various sparse DNNs, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), and Transformers. Following the overview, we give a practical guideline of designing efficient accelerators for sparse DNNs with qualitative metrics to evaluate hardware overhead under different cases. In addition, we highlight potential opportunities in terms of hardware/software/algorithm co-optimizations from the perspective of sparse DNN implementation, and provide insights into recent design trends for the efficient implementation of transformers with sparse attention, which facilitates large language model (LLM) deployments with high throughput and energy efficiency.
引用
收藏
页码:1708 / 1714
页数:7
相关论文
共 50 条
  • [41] An Efficient Hardware Architecture for DNN Training by Exploiting Triple Sparsity
    Huang, Jian
    Lu, Jinming
    Wang, Zhongfeng
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 2802 - 2805
  • [42] Equinox: Training (for Free) on a Custom Inference Accelerator
    Drumond, Mario
    Coulon, Louis
    Pourhabibi, Arash
    Yuzuguler, Ahmet Caner
    Falsafi, Babak
    Jaggi, Martin
    PROCEEDINGS OF 54TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2021, 2021, : 421 - 433
  • [43] Efficient Hardware Accelerator for Compressed Sparse Deep Neural Network
    Xiao, Hao
    Zhao, Kaikai
    Liu, Guangzhu
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (05) : 772 - 775
  • [44] Moth: A Hardware Accelerator for Neural Radiance Field Inference on FPGA
    Wang, Yuanfang
    Li, Yu
    Zhang, Haoyang
    Yu, Jun
    Wang, Kun
    2023 IEEE 31ST ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, FCCM, 2023, : 220 - 220
  • [45] SPARCNet: A Hardware Accelerator for Efficient Deployment of Sparse Convolutional Networks
    Page, Adam
    Jafari, Ali
    Shea, Colin
    Mohsenin, Tinoosh
    ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2017, 13 (03)
  • [46] An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs
    Lu, Liqiang
    Xie, Jiaming
    Huang, Ruirui
    Zhang, Jiansong
    Lin, Wei
    Liang, Yun
    2019 27TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2019, : 17 - 25
  • [47] Hardware Accelerator for Probabilistic Inference in 65-nm CMOS
    Khan, Osama U.
    Wentzloff, David D.
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2016, 24 (03) : 837 - 845
  • [48] An Inference Hardware Accelerator for EEG-based Emotion Detection
    Gonzalez, Hector A.
    Muzaffar, Shahzad
    Yoo, Jerald
    Elfadel, Ibrahim M.
    2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [49] A DNN Optimization Framework with Unlabeled Data for Efficient and Accurate Reconfigurable Hardware Inference
    Chen, Kai
    Huang, Yimin
    Du, Yuan
    Shao, Zhuang
    Gu, Xingyu
    Du, Li
    Wang, Zhongfeng
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [50] Sparse-YOLO: Hardware/Software Co-Design of an FPGA Accelerator for YOLOv2
    Wang, Zixiao
    Xu, Ke
    Wu, Shuaixiao
    Liu, Li
    Liu, Lingzhi
    Wang, Dong
    IEEE ACCESS, 2020, 8 : 116569 - 116585