AOS: An Automated Overclocking System for High-Performance CNN Accelerator Through Timing Delay Measurement on FPGA

被引：4

作者：

Jiang, Weixiong ^{[1
,2
]}

Yu, Heng ^{[3
]}

Chen, Fupeng ^{[1
,2
]}

Ha, Yajun ^{[4
,5
]}

机构：

[1] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China

[2] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 101408, Peoples R China

[3] Univ Nottingham Ningbo China, Sch Comp Sci, Ningbo 315100, Peoples R China

[4] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China

[5] ShanghaiTech Univ, Shanghai Engn Res Ctr Energy Efficient & Custom AI, Shanghai 201210, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2023年 / 42卷 / 09期

关键词：

conventional neural network (CNN); fault tolerance; FPGA; overclocking; ENERGY;

D O I：

10.1109/TCAD.2023.3235803

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the inherent algorithmic error resilience of conventional neural networks (CNNs) and the worst-case design methodologies of current electronic design automation tools, overclocking-based timing speculation is a promising technique to improve the performance of CNN accelerators on FPGA by removing unnecessary timing margins. To avoid potential timing errors, timing delay measurement should be used during overclocking. However, current approaches are not yet good at measuring paths with more intense variability factors such as jitter and lack an automated process for testing circuit delays. In this article, we first propose 2-dimension multiframe fusion to deal with the sampling jitter, then present a timing delay measurement-based automatic overclocking system (AOS) running on heterogeneous FPGA for high-performance CNN accelerators. On the FPGA side, AOS is composed of timing delay monitors (TDMs) that can measure all types of timing paths, a TDM controller that converts the sampled values of TDMs into timing delay in terms of the ratio of path delay to the clock period. On the CPU side, AOS converts the path delay from clock period ratio to absolute delay value and decides the frequency of the accelerator in the next iteration. We demonstrate AOS with a SkyNet accelerator on the Xilinx ZCU104 board and achieve 657 FPS at 436 MHz without accuracy degradation, which is 1.41 x performance compared to the baseline.

引用

页码：2952 / 2965

页数：14

共 19 条

[1] A High-performance CNN Processor Based on FPGA for MobileNets
Wu, Di
Zhang, Yu
Jia, Xijie
Tian, Lu
Li, Tianping
Sui, Lingzhi
Xie, Dongliang
Shan, Yi
2019 29TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2019, : 136 - 143
[2] Witelo: Automated generation and timing characterization of distributed-control macroblocks for high-performance FPGA designs
Sierra, Roberto
Carreras, Carlos
Caffarena, Gabriel
INTEGRATION-THE VLSI JOURNAL, 2019, 68 : 1 - 11
[3] XVDPU: A High-Performance CNN Accelerator on the Versal Platform Powered by the AI Engine
Jia, Xijie
Zhang, Yu
Liu, Guangdong
Yang, Xinlin
Zhang, Tianyu
Zheng, Jia
Xu, Dongdong
Liu, Zhuohuan
Liu, Mengke
Yan, Xiaoyang
Wang, Hong
Zheng, Rongzhang
Wang, Li
Li, Dong
Pareek, Satyaprakash
Weng, Jian
Tian, Lu
Xie, Dongliang
Luo, Hong
Shan, Yi
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2024, 17 (02)
[4] EPA: The effective pipeline architecture for CNN accelerator with high performance and computing efficiency based on FPGA
Zhang, Junjie
Yin, Qiao
Hu, Weicheng
Li, Yunfeng
Li, Hu
Ye, Nan
Cao, Bingyao
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (18)
[5] High-Performance FPGA Streaming Data Concentrator for GEM Electronic Measurement System for WEST Tokamak
Kolasinski, Piotr
Pozniak, Krzysztof T.
Wojenski, Andrzej
Linczuk, Pawel
Kasprowicz, Grzegorz
Chernyshova, Maryna
Mazon, Didier
Czarski, Tomasz
Colnel, Julian
Malinowski, Karol
Guibert, Denis
ELECTRONICS, 2023, 12 (17)
[6] FPGA-based hardware accelerator for high-performance data-stream processing
Lysakov K.F.
Shadrin M.Y.
Pattern Recognition and Image Analysis, 2013, 23 (1) : 26 - 34
[7] FPGA-Based High-Performance Data Compression Deep Neural Network Accelerator
Wang, Hanze
Fu, Yingxun
Ma, Li
2022 INTERNATIONAL CONFERENCE ON BIG DATA, INFORMATION AND COMPUTER NETWORK (BDICN 2022), 2022, : 563 - 569
[8] BSTMSM: A High-Performance FPGA-based Multi-Scalar Multiplication Hardware Accelerator
Zhao, Baoze
Huang, Wenjin
Li, Tianrui
Huang, Yihua
2023 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY, ICFPT, 2023, : 35 - 43
[9] Generating High-Performance FPGA Accelerator Designs for Big Data Analytics with Fletcher and Apache Arrow
Peltenburg, Johan
van Straten, Jeroen
Brobbel, Matthijs
Al-Ars, Zaid
Hofstee, H. Peter
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2021, 93 (05): : 565 - 586
[10] Generating High-Performance FPGA Accelerator Designs for Big Data Analytics with Fletcher and Apache Arrow
Johan Peltenburg
Jeroen van Straten
Matthijs Brobbel
Zaid Al-Ars
H. Peter Hofstee
Journal of Signal Processing Systems, 2021, 93 : 565 - 586

← 1 2 →