Low-latency Buffering for Mixed-precision Neural Network Accelerator with MulTAP and FQPipe

被引:0
|
作者
Li, Yike [1 ,2 ]
Wang, Zheng [1 ]
Ou, Wenhui [1 ,3 ]
Liang, Chen [1 ]
Zhou, Weiyu [1 ,4 ]
Yang, Yongkui [1 ]
Chen, Chao [1 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
[2] Univ Sci & Technol China, Sch Software Engn, Hefei, Peoples R China
[3] Huazhong Univ Sci & Technol, Sch Mech Sci & Engn, Wuhan, Peoples R China
[4] Xidian Univ, Xian, Peoples R China
关键词
NN accelerator; mixed-precision; activation buffering; quantization pipeline; ENERGY;
D O I
10.1109/ISCAS58744.2024.10558641
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Previous work has proposed precision scalable accelerators to handle mixed-precision neural network (NN) inferences on the edge, which focus on designing reconfigurable MAC arrays while leaving the issue of time-costly data buffering procedure less discussed. Besides, integer-only inference is incapable of handling emerging NN models with various non-linear activation functions. In this work, we propose a mixed-precision NN accelerator supporting int8, int16 and fp32 arithmetic with two buffering techniques namely MulTAP and FQPipe, which jointly facilitate lowlatency data movement. Experiment results show that MulTAP and FQPipe boost the baseline NN accelerator with 7.7x and 1.5x in speed respectively, which leads to the application performance of 473.9 (int8) and 252.5 (int16) inferences per second (IPS) on YOLOv3-Tiny. Post-layout netlist with SMIC 40nm standard-cell technology demonstrates a design with an area of 26.96mm2 and a power estimate of 1.83W.
引用
收藏
页数:5
相关论文
共 50 条
  • [11] Low-Latency Intrusion Detection Using a Deep Neural Network
    Bin Ahmad, Umair
    Akram, Muhammad Arslan
    Mian, Adnan Noor
    IT PROFESSIONAL, 2022, 24 (03) : 67 - 72
  • [12] EVOLUTIONARY QUANTIZATION OF NEURAL NETWORKS WITH MIXED-PRECISION
    Liu, Zhenhua
    Zhang, Xinfeng
    Wang, Shanshe
    Ma, Siwei
    Gao, Wen
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2785 - 2789
  • [13] Low-Latency Neural Stereo Streaming
    Hou, Qiqi
    Farhadzadeh, Farzad
    Said, Amir
    Sautiere, Guillaume
    Le, Hoang
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 7974 - 7984
  • [14] Low-Latency Neural Speech Translation
    Niehues, Jan
    Ngoc-Quan Pham
    Thanh-Le Ha
    Sperber, Matthias
    Waibel, Alex
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1293 - 1297
  • [15] Low-latency remote-offloading system for accelerator
    Saito, Shogo
    Fujimoto, Kei
    Shiraga, Akinori
    ANNALS OF TELECOMMUNICATIONS, 2024, 79 (3-4) : 179 - 196
  • [16] Low-latency remote-offloading system for accelerator
    Shogo Saito
    Kei Fujimoto
    Akinori Shiraga
    Annals of Telecommunications, 2024, 79 : 179 - 196
  • [17] Fair DMA Scheduler for Low-Latency Accelerator Offloading
    Otani, Ikuo
    Fujimoto, Kei
    Shiraga, Akinori
    2022 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING, ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM, 2022, : 26 - 32
  • [18] Training Low-Latency Spiking Neural Network with Orthogonal Spiking Neurons
    Yao, Yunpeng
    Wu, Man
    Zhang, Renyuan
    2023 21ST IEEE INTERREGIONAL NEWCAS CONFERENCE, NEWCAS, 2023,
  • [19] EdgeDRNN: Enabling Low-latency Recurrent Neural Network Edge Inference
    Gao, Chang
    Rios-Navarro, Antonio
    Chen, Xi
    Delbruck, Tobi
    Liu, Shih-Chii
    2020 2ND IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2020), 2020, : 41 - 45
  • [20] DPSNN: spiking neural network for low-latency streaming speech enhancement
    Sun, Tao
    Bohte, Sander
    NEUROMORPHIC COMPUTING AND ENGINEERING, 2024, 4 (04):