Low-latency Buffering for Mixed-precision Neural Network Accelerator with MulTAP and FQPipe

被引：0

作者：

Li, Yike ^{[1
,2
]}

Wang, Zheng ^{[1
]}

Ou, Wenhui ^{[1
,3
]}

Liang, Chen ^{[1
]}

Zhou, Weiyu ^{[1
,4
]}

Yang, Yongkui ^{[1
]}

Chen, Chao ^{[1
]}

机构：

[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China

[2] Univ Sci & Technol China, Sch Software Engn, Hefei, Peoples R China

[3] Huazhong Univ Sci & Technol, Sch Mech Sci & Engn, Wuhan, Peoples R China

[4] Xidian Univ, Xian, Peoples R China

来源：

2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024 | 2024年

关键词：

NN accelerator; mixed-precision; activation buffering; quantization pipeline; ENERGY;

D O I：

10.1109/ISCAS58744.2024.10558641

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Previous work has proposed precision scalable accelerators to handle mixed-precision neural network (NN) inferences on the edge, which focus on designing reconfigurable MAC arrays while leaving the issue of time-costly data buffering procedure less discussed. Besides, integer-only inference is incapable of handling emerging NN models with various non-linear activation functions. In this work, we propose a mixed-precision NN accelerator supporting int8, int16 and fp32 arithmetic with two buffering techniques namely MulTAP and FQPipe, which jointly facilitate lowlatency data movement. Experiment results show that MulTAP and FQPipe boost the baseline NN accelerator with 7.7x and 1.5x in speed respectively, which leads to the application performance of 473.9 (int8) and 252.5 (int16) inferences per second (IPS) on YOLOv3-Tiny. Post-layout netlist with SMIC 40nm standard-cell technology demonstrates a design with an area of 26.96mm2 and a power estimate of 1.83W.

引用

页数：5

共 50 条

[1] Low-Latency Full Precision Optical Convolutional Neural Network Accelerator
Jahannia, Belal
Ye, Jiachi
Altaleb, Salem
Peserico, Nicola
Asadizanjani, Navid
Heidari, Elham
Sorger, Volker J.
Dalir, Hamed
AI AND OPTICAL DATA SCIENCES V, 2024, 12903
[2] A 12.1 TOPS/W Mixed-precision Quantized Deep Convolutional Neural Network Accelerator for Low Power on Edge / Endpoint Device
Isono, Takanori
Yamakura, Makoto
Shimaya, Satoshi
Kawamoto, Isao
Tsuboi, Nobuhiro
Mineo, Masaaki
Nakajima, Wataru
Ishida, Kenichi
Sasaki, Shin
Higuchi, Toshio
Hoshaku, Masahiro
Murakami, Daisuke
Iwasaki, Toshifumi
Hirai, Hiroshi
2020 IEEE ASIAN SOLID-STATE CIRCUITS CONFERENCE (A-SSCC), 2020,
[3] DeepBurning-MixQ: An Open Source Mixed-Precision Neural Network Accelerator Design Framework for FPGAs
Luo, Erjing
Huang, Haitong
Liu, Cheng
Li, Guoyu
Yang, Bing
Wang, Ying
Li, Huawei
Li, Xiaowei
2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
[4] Optimized co-scheduling of mixed-precision neural network accelerator for real-time multitasking applications
Jiang, Wei
Song, Ziwei
Zhan, Jinyu
He, Zhiyuan
Wen, Xiangyu
Jiang, Ke
JOURNAL OF SYSTEMS ARCHITECTURE, 2020, 110 (110)
[5] A LOW-LATENCY SPARSE-WINOGRAD ACCELERATOR FOR CONVOLUTIONAL NEURAL NETWORKS
Wang, Haonan
Liu, Wenjian
Xu, Tianyi
Lin, Jun
Wang, Zhongfeng
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1448 - 1452
[6] Mixed-precision Deep Neural Network Quantization With Multiple Compression Rates
Wang, Xuanda
Fei, Wen
Dai, Wenrui
Li, Chenglin
Zou, Junni
Xiong, Hongkai
2023 DATA COMPRESSION CONFERENCE, DCC, 2023, : 371 - 371
[7] A Review of State-of-the-art Mixed-Precision Neural Network Frameworks
Rakka, Mariam
Fouda, Mohammed E.
Khargonekar, Pramod
Kurdahi, Fadi
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 7793 - 7812
[8] An FPGA-Based Low-Latency Accelerator for Randomly Wired Neural Networks
Kuramochi, Ryosuke
Nakahara, Hiroki
2020 30TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2020, : 298 - 303
[9] Low-Latency Convolutional Recurrent Neural Network for Keyword Spotting
Du, Hu
Li, Ruohan
Kim, Donggyun
Hirota, Kaoru
Dai, Yaping
2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 802 - 807
[10] Low-Latency Neural Network for Efficient Hyperspectral Image Classification
Li, Chunchao
Li, Jun
Peng, Mingrui
Rasti, Behnood
Duan, Puhong
Tang, Xuebin
Ma, Xiaoguang
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 7374 - 7390

← 1 2 3 4 5 →