Hardware Accelerator Design for Sparse DNN Inference and Training: A Tutorial

被引：3

作者：

Mao, Wendong ^{[1
]}

Wang, Meiqi ^{[1
]}

Xie, Xiaoru ^{[2
]}

Wu, Xiao ^{[2
]}

Wang, Zhongfeng ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Integrated Circuits, Shenzhen Campus, Shenzhen 518107, Guangdong, Peoples R China

[2] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210008, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS | 2024年 / 71卷 / 03期

关键词：

Hardware acceleration; sparsity; CNN; transformer; tutorial; deep learning; FLEXIBLE ACCELERATOR; NEURAL-NETWORKS; EFFICIENT; ARCHITECTURE;

D O I：

10.1109/TCSII.2023.3344681

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Deep neural networks (DNNs) are widely used in many fields, such as artificial intelligence generated content (AIGC) and robotics. To efficiently support these tasks, the model pruning technique is developed to compress the computational and memory-intensive DNNs. However, directly executing these sparse models on a common hardware accelerator can cause significant under-utilization, since invalid data resulting from the sparse patterns leads to unnecessary computations and irregular memory accesses. This brief analyzes the critical issues in accelerating sparse models, and provides an overview of typical hardware designs for various sparse DNNs, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), and Transformers. Following the overview, we give a practical guideline of designing efficient accelerators for sparse DNNs with qualitative metrics to evaluate hardware overhead under different cases. In addition, we highlight potential opportunities in terms of hardware/software/algorithm co-optimizations from the perspective of sparse DNN implementation, and provide insights into recent design trends for the efficient implementation of transformers with sparse attention, which facilitates large language model (LLM) deployments with high throughput and energy efficiency.

引用

页码：1708 / 1714

页数：7

共 50 条

[41] An Efficient Hardware Architecture for DNN Training by Exploiting Triple Sparsity
Huang, Jian
Lu, Jinming
Wang, Zhongfeng
2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 2802 - 2805
[42] Equinox: Training (for Free) on a Custom Inference Accelerator
Drumond, Mario
Coulon, Louis
Pourhabibi, Arash
Yuzuguler, Ahmet Caner
Falsafi, Babak
Jaggi, Martin
PROCEEDINGS OF 54TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2021, 2021, : 421 - 433
[43] Efficient Hardware Accelerator for Compressed Sparse Deep Neural Network
Xiao, Hao
Zhao, Kaikai
Liu, Guangzhu
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (05) : 772 - 775
[44] Moth: A Hardware Accelerator for Neural Radiance Field Inference on FPGA
Wang, Yuanfang
Li, Yu
Zhang, Haoyang
Yu, Jun
Wang, Kun
2023 IEEE 31ST ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, FCCM, 2023, : 220 - 220
[45] SPARCNet: A Hardware Accelerator for Efficient Deployment of Sparse Convolutional Networks
Page, Adam
Jafari, Ali
Shea, Colin
Mohsenin, Tinoosh
ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2017, 13 (03)
[46] An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs
Lu, Liqiang
Xie, Jiaming
Huang, Ruirui
Zhang, Jiansong
Lin, Wei
Liang, Yun
2019 27TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2019, : 17 - 25
[47] Hardware Accelerator for Probabilistic Inference in 65-nm CMOS
Khan, Osama U.
Wentzloff, David D.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2016, 24 (03) : 837 - 845
[48] An Inference Hardware Accelerator for EEG-based Emotion Detection
Gonzalez, Hector A.
Muzaffar, Shahzad
Yoo, Jerald
Elfadel, Ibrahim M.
2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
[49] A DNN Optimization Framework with Unlabeled Data for Efficient and Accurate Reconfigurable Hardware Inference
Chen, Kai
Huang, Yimin
Du, Yuan
Shao, Zhuang
Gu, Xingyu
Du, Li
Wang, Zhongfeng
2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
[50] Sparse-YOLO: Hardware/Software Co-Design of an FPGA Accelerator for YOLOv2
Wang, Zixiao
Xu, Ke
Wu, Shuaixiao
Liu, Li
Liu, Lingzhi
Wang, Dong
IEEE ACCESS, 2020, 8 : 116569 - 116585

← 1 2 3 4 5 →