Low-latency Buffering for Mixed-precision Neural Network Accelerator with MulTAP and FQPipe

被引：0

作者：

Li, Yike ^{[1
,2
]}

Wang, Zheng ^{[1
]}

Ou, Wenhui ^{[1
,3
]}

Liang, Chen ^{[1
]}

Zhou, Weiyu ^{[1
,4
]}

Yang, Yongkui ^{[1
]}

Chen, Chao ^{[1
]}

机构：

[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China

[2] Univ Sci & Technol China, Sch Software Engn, Hefei, Peoples R China

[3] Huazhong Univ Sci & Technol, Sch Mech Sci & Engn, Wuhan, Peoples R China

[4] Xidian Univ, Xian, Peoples R China

来源：

2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024 | 2024年

关键词：

NN accelerator; mixed-precision; activation buffering; quantization pipeline; ENERGY;

D O I：

10.1109/ISCAS58744.2024.10558641

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Previous work has proposed precision scalable accelerators to handle mixed-precision neural network (NN) inferences on the edge, which focus on designing reconfigurable MAC arrays while leaving the issue of time-costly data buffering procedure less discussed. Besides, integer-only inference is incapable of handling emerging NN models with various non-linear activation functions. In this work, we propose a mixed-precision NN accelerator supporting int8, int16 and fp32 arithmetic with two buffering techniques namely MulTAP and FQPipe, which jointly facilitate lowlatency data movement. Experiment results show that MulTAP and FQPipe boost the baseline NN accelerator with 7.7x and 1.5x in speed respectively, which leads to the application performance of 473.9 (int8) and 252.5 (int16) inferences per second (IPS) on YOLOv3-Tiny. Post-layout netlist with SMIC 40nm standard-cell technology demonstrates a design with an area of 26.96mm2 and a power estimate of 1.83W.

引用

页数：5

共 50 条

[11] Low-Latency Intrusion Detection Using a Deep Neural Network
Bin Ahmad, Umair
Akram, Muhammad Arslan
Mian, Adnan Noor
IT PROFESSIONAL, 2022, 24 (03) : 67 - 72
[12] EVOLUTIONARY QUANTIZATION OF NEURAL NETWORKS WITH MIXED-PRECISION
Liu, Zhenhua
Zhang, Xinfeng
Wang, Shanshe
Ma, Siwei
Gao, Wen
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2785 - 2789
[13] Low-Latency Neural Stereo Streaming
Hou, Qiqi
Farhadzadeh, Farzad
Said, Amir
Sautiere, Guillaume
Le, Hoang
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 7974 - 7984
[14] Low-Latency Neural Speech Translation
Niehues, Jan
Ngoc-Quan Pham
Thanh-Le Ha
Sperber, Matthias
Waibel, Alex
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1293 - 1297
[15] Low-latency remote-offloading system for accelerator
Saito, Shogo
Fujimoto, Kei
Shiraga, Akinori
ANNALS OF TELECOMMUNICATIONS, 2024, 79 (3-4) : 179 - 196
[16] Low-latency remote-offloading system for accelerator
Shogo Saito
Kei Fujimoto
Akinori Shiraga
Annals of Telecommunications, 2024, 79 : 179 - 196
[17] Fair DMA Scheduler for Low-Latency Accelerator Offloading
Otani, Ikuo
Fujimoto, Kei
Shiraga, Akinori
2022 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING, ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM, 2022, : 26 - 32
[18] Training Low-Latency Spiking Neural Network with Orthogonal Spiking Neurons
Yao, Yunpeng
Wu, Man
Zhang, Renyuan
2023 21ST IEEE INTERREGIONAL NEWCAS CONFERENCE, NEWCAS, 2023,
[19] EdgeDRNN: Enabling Low-latency Recurrent Neural Network Edge Inference
Gao, Chang
Rios-Navarro, Antonio
Chen, Xi
Delbruck, Tobi
Liu, Shih-Chii
2020 2ND IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2020), 2020, : 41 - 45
[20] DPSNN: spiking neural network for low-latency streaming speech enhancement
Sun, Tao
Bohte, Sander
NEUROMORPHIC COMPUTING AND ENGINEERING, 2024, 4 (04):

← 1 2 3 4 5 →