A 40 TOPS Single-Chip Accelerator Enabling Low-Latency Inference for Deep Neural Networks

被引:0
作者
He, Xun [1 ]
Cao, Tao [1 ]
Liu, Youjiang [1 ]
Zhong, Le [1 ]
Xiao, Guoping [1 ]
Yu, Cong [1 ]
机构
[1] China Acad Engn Phys, Inst Elect Engn, Mianyang 621000, Peoples R China
关键词
Kernel; Decoding; Sparse matrices; System-on-chip; Low latency communication; Indexes; Costs; YOLO; Training; Throughput; Sparse; accelerator; load balance; compute-near-memory; low latency; pruning; weight compression; zero skip; ENERGY; MEMORY;
D O I
10.1109/TCSII.2025.3563062
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
To achieve low latency for edge applications, a single-chip sparse accelerator is proposed, which can conduct deep neural network (DNN) inference only using limited on-chip memory. Private memory is eliminated, and all memories are shared to reduce power and chip area. An adaptive and variable-length compression algorithm is proposed to store sparse DNNs. A weak-constrained pruning algorithm is proposed to resolve load balance issue in kernel level, which can achieve almost the same sparsity as unconstrained pruning schemes (UCP). Based on these works, a low latency inference accelerator is fabricated in 28-nm CMOS with 8256 MACs and 9.4 MB on-chip SRAM, which can achieve a latency of 0.44 ms for YOLO3 tiny. For high-sparsity layers, our chip can achieve 6.1x speedup and a throughput of 40 TOPS. With a pruned YOLO model, our accelerator achieves 6.7x lower latency and 21.7x better energy efficiency than Jetson Orin. A high-speed evaluation platform is built to demonstrate real-time object detection at a throughput of 600 frames per second (fps) with a power of 1.34 W.
引用
收藏
页码:848 / 852
页数:5
相关论文
共 16 条
[1]   NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps [J].
Aimar, Alessandro ;
Mostafa, Hesham ;
Calabrese, Enrico ;
Rios-Navarro, Antonio ;
Tapiador-Morales, Ricardo ;
Lungu, Iulia-Alexandra ;
Milde, Moritz B. ;
Corradi, Federico ;
Linares-Barranco, Alejandro ;
Liu, Shih-Chii ;
Delbruck, Tobi .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (03) :644-656
[2]   Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing [J].
Albericio, Jorge ;
Judd, Patrick ;
Hetherington, Tayler ;
Aamodt, Tor ;
Jerger, Natalie Enright ;
Moshovos, Andreas .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :1-13
[3]   BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W [J].
Ando, Kota ;
Ueyoshi, Kodai ;
Orimo, Kentaro ;
Yonekawa, Haruyoshi ;
Sato, Shimpei ;
Nakahara, Hiroki ;
Takamaeda-Yamazaki, Shinya ;
Ikebe, Masayuki ;
Asai, Tetsuya ;
Kuroda, Tadahiro ;
Motomura, Masato .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (04) :983-994
[4]  
Chen YH, 2016, ISSCC DIG TECH PAP I, V59, P262, DOI 10.1109/ISSCC.2016.7418007
[5]   A Low-Power Sparse Convolutional Neural Network Accelerator With Pre-Encoding Radix-4 Booth Multiplier [J].
Cheng, Quan ;
Dai, Liuyao ;
Huang, Mingqiang ;
Shen, Ao ;
Mao, Wei ;
Hashimoto, Masanori ;
Yu, Hao .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (06) :2246-2250
[6]   EIE: Efficient Inference Engine on Compressed Deep Neural Network [J].
Han, Song ;
Liu, Xingyu ;
Mao, Huizi ;
Pu, Jing ;
Pedram, Ardavan ;
Horowitz, Mark A. ;
Dally, William J. .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :243-254
[7]   Sparsity-Aware and Re-configurable NPU Architecture for Samsung Flagship Mobile SoC [J].
Jang, Jun-Woo ;
Lee, Sehwan ;
Kim, Dongyoung ;
Park, Hyunsun ;
Ardestani, Ali Shafiee ;
Choi, Yeongjae ;
Kim, Channoh ;
Kim, Yoojin ;
Yu, Hyeongseok ;
Abdel-Aziz, Hamzah ;
Park, Jun-Seok ;
Lee, Heonsoo ;
Lee, Dongwoo ;
Kim, Myeong Woo ;
Jung, Hanwoong ;
Nam, Heewoo ;
Lim, Dongguen ;
Lee, Seungwon ;
Song, Joon-Ho ;
Kwon, Suknam ;
Hassoun, Joseph ;
Lim, SukHwan ;
Choi, Changkyu .
2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, :15-28
[9]   An Energy-Efficient Deep Convolutional Neural Network Accelerator Featuring Conditional Computing and Low External Memory Access [J].
Kim, Minkyu ;
Seo, Jae-Sun .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2021, 56 (03) :803-813
[10]  
Lee J, 2019, ISSCC DIG TECH PAP I, V62, P142, DOI 10.1109/ISSCC.2019.8662302