AOS: An Automated Overclocking System for High-Performance CNN Accelerator Through Timing Delay Measurement on FPGA

被引:4
|
作者
Jiang, Weixiong [1 ,2 ]
Yu, Heng [3 ]
Chen, Fupeng [1 ,2 ]
Ha, Yajun [4 ,5 ]
机构
[1] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China
[2] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 101408, Peoples R China
[3] Univ Nottingham Ningbo China, Sch Comp Sci, Ningbo 315100, Peoples R China
[4] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China
[5] ShanghaiTech Univ, Shanghai Engn Res Ctr Energy Efficient & Custom AI, Shanghai 201210, Peoples R China
关键词
conventional neural network (CNN); fault tolerance; FPGA; overclocking; ENERGY;
D O I
10.1109/TCAD.2023.3235803
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the inherent algorithmic error resilience of conventional neural networks (CNNs) and the worst-case design methodologies of current electronic design automation tools, overclocking-based timing speculation is a promising technique to improve the performance of CNN accelerators on FPGA by removing unnecessary timing margins. To avoid potential timing errors, timing delay measurement should be used during overclocking. However, current approaches are not yet good at measuring paths with more intense variability factors such as jitter and lack an automated process for testing circuit delays. In this article, we first propose 2-dimension multiframe fusion to deal with the sampling jitter, then present a timing delay measurement-based automatic overclocking system (AOS) running on heterogeneous FPGA for high-performance CNN accelerators. On the FPGA side, AOS is composed of timing delay monitors (TDMs) that can measure all types of timing paths, a TDM controller that converts the sampled values of TDMs into timing delay in terms of the ratio of path delay to the clock period. On the CPU side, AOS converts the path delay from clock period ratio to absolute delay value and decides the frequency of the accelerator in the next iteration. We demonstrate AOS with a SkyNet accelerator on the Xilinx ZCU104 board and achieve 657 FPS at 436 MHz without accuracy degradation, which is 1.41 x performance compared to the baseline.
引用
收藏
页码:2952 / 2965
页数:14
相关论文
共 19 条
  • [1] A High-performance CNN Processor Based on FPGA for MobileNets
    Wu, Di
    Zhang, Yu
    Jia, Xijie
    Tian, Lu
    Li, Tianping
    Sui, Lingzhi
    Xie, Dongliang
    Shan, Yi
    2019 29TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2019, : 136 - 143
  • [2] Witelo: Automated generation and timing characterization of distributed-control macroblocks for high-performance FPGA designs
    Sierra, Roberto
    Carreras, Carlos
    Caffarena, Gabriel
    INTEGRATION-THE VLSI JOURNAL, 2019, 68 : 1 - 11
  • [3] XVDPU: A High-Performance CNN Accelerator on the Versal Platform Powered by the AI Engine
    Jia, Xijie
    Zhang, Yu
    Liu, Guangdong
    Yang, Xinlin
    Zhang, Tianyu
    Zheng, Jia
    Xu, Dongdong
    Liu, Zhuohuan
    Liu, Mengke
    Yan, Xiaoyang
    Wang, Hong
    Zheng, Rongzhang
    Wang, Li
    Li, Dong
    Pareek, Satyaprakash
    Weng, Jian
    Tian, Lu
    Xie, Dongliang
    Luo, Hong
    Shan, Yi
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2024, 17 (02)
  • [4] EPA: The effective pipeline architecture for CNN accelerator with high performance and computing efficiency based on FPGA
    Zhang, Junjie
    Yin, Qiao
    Hu, Weicheng
    Li, Yunfeng
    Li, Hu
    Ye, Nan
    Cao, Bingyao
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (18)
  • [5] High-Performance FPGA Streaming Data Concentrator for GEM Electronic Measurement System for WEST Tokamak
    Kolasinski, Piotr
    Pozniak, Krzysztof T.
    Wojenski, Andrzej
    Linczuk, Pawel
    Kasprowicz, Grzegorz
    Chernyshova, Maryna
    Mazon, Didier
    Czarski, Tomasz
    Colnel, Julian
    Malinowski, Karol
    Guibert, Denis
    ELECTRONICS, 2023, 12 (17)
  • [6] FPGA-based hardware accelerator for high-performance data-stream processing
    Lysakov K.F.
    Shadrin M.Y.
    Pattern Recognition and Image Analysis, 2013, 23 (1) : 26 - 34
  • [7] FPGA-Based High-Performance Data Compression Deep Neural Network Accelerator
    Wang, Hanze
    Fu, Yingxun
    Ma, Li
    2022 INTERNATIONAL CONFERENCE ON BIG DATA, INFORMATION AND COMPUTER NETWORK (BDICN 2022), 2022, : 563 - 569
  • [8] BSTMSM: A High-Performance FPGA-based Multi-Scalar Multiplication Hardware Accelerator
    Zhao, Baoze
    Huang, Wenjin
    Li, Tianrui
    Huang, Yihua
    2023 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY, ICFPT, 2023, : 35 - 43
  • [9] Generating High-Performance FPGA Accelerator Designs for Big Data Analytics with Fletcher and Apache Arrow
    Peltenburg, Johan
    van Straten, Jeroen
    Brobbel, Matthijs
    Al-Ars, Zaid
    Hofstee, H. Peter
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2021, 93 (05): : 565 - 586
  • [10] Generating High-Performance FPGA Accelerator Designs for Big Data Analytics with Fletcher and Apache Arrow
    Johan Peltenburg
    Jeroen van Straten
    Matthijs Brobbel
    Zaid Al-Ars
    H. Peter Hofstee
    Journal of Signal Processing Systems, 2021, 93 : 565 - 586