Low-latency Buffering for Mixed-precision Neural Network Accelerator with MulTAP and FQPipe

被引:0
|
作者
Li, Yike [1 ,2 ]
Wang, Zheng [1 ]
Ou, Wenhui [1 ,3 ]
Liang, Chen [1 ]
Zhou, Weiyu [1 ,4 ]
Yang, Yongkui [1 ]
Chen, Chao [1 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
[2] Univ Sci & Technol China, Sch Software Engn, Hefei, Peoples R China
[3] Huazhong Univ Sci & Technol, Sch Mech Sci & Engn, Wuhan, Peoples R China
[4] Xidian Univ, Xian, Peoples R China
关键词
NN accelerator; mixed-precision; activation buffering; quantization pipeline; ENERGY;
D O I
10.1109/ISCAS58744.2024.10558641
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Previous work has proposed precision scalable accelerators to handle mixed-precision neural network (NN) inferences on the edge, which focus on designing reconfigurable MAC arrays while leaving the issue of time-costly data buffering procedure less discussed. Besides, integer-only inference is incapable of handling emerging NN models with various non-linear activation functions. In this work, we propose a mixed-precision NN accelerator supporting int8, int16 and fp32 arithmetic with two buffering techniques namely MulTAP and FQPipe, which jointly facilitate lowlatency data movement. Experiment results show that MulTAP and FQPipe boost the baseline NN accelerator with 7.7x and 1.5x in speed respectively, which leads to the application performance of 473.9 (int8) and 252.5 (int16) inferences per second (IPS) on YOLOv3-Tiny. Post-layout netlist with SMIC 40nm standard-cell technology demonstrates a design with an area of 26.96mm2 and a power estimate of 1.83W.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Low-Latency Full Precision Optical Convolutional Neural Network Accelerator
    Jahannia, Belal
    Ye, Jiachi
    Altaleb, Salem
    Peserico, Nicola
    Asadizanjani, Navid
    Heidari, Elham
    Sorger, Volker J.
    Dalir, Hamed
    AI AND OPTICAL DATA SCIENCES V, 2024, 12903
  • [2] A 12.1 TOPS/W Mixed-precision Quantized Deep Convolutional Neural Network Accelerator for Low Power on Edge / Endpoint Device
    Isono, Takanori
    Yamakura, Makoto
    Shimaya, Satoshi
    Kawamoto, Isao
    Tsuboi, Nobuhiro
    Mineo, Masaaki
    Nakajima, Wataru
    Ishida, Kenichi
    Sasaki, Shin
    Higuchi, Toshio
    Hoshaku, Masahiro
    Murakami, Daisuke
    Iwasaki, Toshifumi
    Hirai, Hiroshi
    2020 IEEE ASIAN SOLID-STATE CIRCUITS CONFERENCE (A-SSCC), 2020,
  • [3] DeepBurning-MixQ: An Open Source Mixed-Precision Neural Network Accelerator Design Framework for FPGAs
    Luo, Erjing
    Huang, Haitong
    Liu, Cheng
    Li, Guoyu
    Yang, Bing
    Wang, Ying
    Li, Huawei
    Li, Xiaowei
    2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
  • [4] Optimized co-scheduling of mixed-precision neural network accelerator for real-time multitasking applications
    Jiang, Wei
    Song, Ziwei
    Zhan, Jinyu
    He, Zhiyuan
    Wen, Xiangyu
    Jiang, Ke
    JOURNAL OF SYSTEMS ARCHITECTURE, 2020, 110 (110)
  • [5] A LOW-LATENCY SPARSE-WINOGRAD ACCELERATOR FOR CONVOLUTIONAL NEURAL NETWORKS
    Wang, Haonan
    Liu, Wenjian
    Xu, Tianyi
    Lin, Jun
    Wang, Zhongfeng
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1448 - 1452
  • [6] Mixed-precision Deep Neural Network Quantization With Multiple Compression Rates
    Wang, Xuanda
    Fei, Wen
    Dai, Wenrui
    Li, Chenglin
    Zou, Junni
    Xiong, Hongkai
    2023 DATA COMPRESSION CONFERENCE, DCC, 2023, : 371 - 371
  • [7] A Review of State-of-the-art Mixed-Precision Neural Network Frameworks
    Rakka, Mariam
    Fouda, Mohammed E.
    Khargonekar, Pramod
    Kurdahi, Fadi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 7793 - 7812
  • [8] An FPGA-Based Low-Latency Accelerator for Randomly Wired Neural Networks
    Kuramochi, Ryosuke
    Nakahara, Hiroki
    2020 30TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2020, : 298 - 303
  • [9] Low-Latency Convolutional Recurrent Neural Network for Keyword Spotting
    Du, Hu
    Li, Ruohan
    Kim, Donggyun
    Hirota, Kaoru
    Dai, Yaping
    2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 802 - 807
  • [10] Low-Latency Neural Network for Efficient Hyperspectral Image Classification
    Li, Chunchao
    Li, Jun
    Peng, Mingrui
    Rasti, Behnood
    Duan, Puhong
    Tang, Xuebin
    Ma, Xiaoguang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 7374 - 7390