A 40 TOPS Single-Chip Accelerator Enabling Low-Latency Inference for Deep Neural Networks

被引：0

作者：

He, Xun ^{[1
]}

Cao, Tao ^{[1
]}

Liu, Youjiang ^{[1
]}

Zhong, Le ^{[1
]}

Xiao, Guoping ^{[1
]}

Yu, Cong ^{[1
]}

机构：

[1] China Acad Engn Phys, Inst Elect Engn, Mianyang 621000, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS | 2025年 / 72卷 / 06期

关键词：

Kernel; Decoding; Sparse matrices; System-on-chip; Low latency communication; Indexes; Costs; YOLO; Training; Throughput; Sparse; accelerator; load balance; compute-near-memory; low latency; pruning; weight compression; zero skip; ENERGY; MEMORY;

D O I：

10.1109/TCSII.2025.3563062

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

To achieve low latency for edge applications, a single-chip sparse accelerator is proposed, which can conduct deep neural network (DNN) inference only using limited on-chip memory. Private memory is eliminated, and all memories are shared to reduce power and chip area. An adaptive and variable-length compression algorithm is proposed to store sparse DNNs. A weak-constrained pruning algorithm is proposed to resolve load balance issue in kernel level, which can achieve almost the same sparsity as unconstrained pruning schemes (UCP). Based on these works, a low latency inference accelerator is fabricated in 28-nm CMOS with 8256 MACs and 9.4 MB on-chip SRAM, which can achieve a latency of 0.44 ms for YOLO3 tiny. For high-sparsity layers, our chip can achieve 6.1x speedup and a throughput of 40 TOPS. With a pruned YOLO model, our accelerator achieves 6.7x lower latency and 21.7x better energy efficiency than Jetson Orin. A high-speed evaluation platform is built to demonstrate real-time object detection at a throughput of 600 frames per second (fps) with a power of 1.34 W.

引用

页码：848 / 852

页数：5

共 16 条

[1] NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps [J].

Aimar, Alessandro ;

Mostafa, Hesham ;

Calabrese, Enrico ;

Rios-Navarro, Antonio ;

Tapiador-Morales, Ricardo ;

Lungu, Iulia-Alexandra ;

Milde, Moritz B. ;

Corradi, Federico ;

Linares-Barranco, Alejandro ;

Liu, Shih-Chii ;

Delbruck, Tobi .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (03) :644-656

[2] Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing [J].

Albericio, Jorge ;

Judd, Patrick ;

Hetherington, Tayler ;

Aamodt, Tor ;

Jerger, Natalie Enright ;

Moshovos, Andreas .

2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :1-13

[3] BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W [J].

Ando, Kota ;

Ueyoshi, Kodai ;

Orimo, Kentaro ;

Yonekawa, Haruyoshi ;

Sato, Shimpei ;

Nakahara, Hiroki ;

Takamaeda-Yamazaki, Shinya ;

Ikebe, Masayuki ;

Asai, Tetsuya ;

Kuroda, Tadahiro ;

Motomura, Masato .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (04) :983-994

[4]

Chen YH, 2016, ISSCC DIG TECH PAP I, V59, P262, DOI 10.1109/ISSCC.2016.7418007

[5] A Low-Power Sparse Convolutional Neural Network Accelerator With Pre-Encoding Radix-4 Booth Multiplier [J].

Cheng, Quan ;

Dai, Liuyao ;

Huang, Mingqiang ;

Shen, Ao ;

Mao, Wei ;

Hashimoto, Masanori ;

Yu, Hao .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (06) :2246-2250

[6] EIE: Efficient Inference Engine on Compressed Deep Neural Network [J].

Han, Song ;

Liu, Xingyu ;

Mao, Huizi ;

Pu, Jing ;

Pedram, Ardavan ;

Horowitz, Mark A. ;

Dally, William J. .

2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :243-254

[7] Sparsity-Aware and Re-configurable NPU Architecture for Samsung Flagship Mobile SoC [J].

Jang, Jun-Woo ;

Lee, Sehwan ;

Kim, Dongyoung ;

Park, Hyunsun ;

Ardestani, Ali Shafiee ;

Choi, Yeongjae ;

Kim, Channoh ;

Kim, Yoojin ;

Yu, Hyeongseok ;

Abdel-Aziz, Hamzah ;

Park, Jun-Seok ;

Lee, Heonsoo ;

Lee, Dongwoo ;

Kim, Myeong Woo ;

Jung, Hanwoong ;

Nam, Heewoo ;

Lim, Dongguen ;

Lee, Seungwon ;

Song, Joon-Ho ;

Kwon, Suknam ;

Hassoun, Joseph ;

Lim, SukHwan ;

Choi, Changkyu .

2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, :15-28

[8] Accelerator-Aware Pruning for Convolutional Neural Networks [J].

Kang, Hyeong-Ju .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (07) :2093-2103

[9] An Energy-Efficient Deep Convolutional Neural Network Accelerator Featuring Conditional Computing and Low External Memory Access [J].

Kim, Minkyu ;

Seo, Jae-Sun .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2021, 56 (03) :803-813

[10]

Lee J, 2019, ISSCC DIG TECH PAP I, V62, P142, DOI 10.1109/ISSCC.2019.8662302

← 1 2 →