Systematic Reliability Evaluation of FPGA Implemented CNN Accelerators

被引:7
作者
Gao, Zhen [1 ]
Gao, Shihui [1 ]
Yao, Yi [1 ]
Liu, Qiang [2 ]
Zeng, Shulin [3 ]
Ge, Guangjun [3 ]
Wang, Yu [3 ]
Ullah, Anees [4 ]
Reviriego, Pedro [5 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Sch Microelect, Tianjin 300072, Peoples R China
[3] Tsinghua Univ, Sch Elect Engn, Beijing 100084, Peoples R China
[4] Univ Engn & Technol, Dept Elect Engn, Abbottabad Campus, Abbottabad 220101, Pakistan
[5] Univ Politecn Madrid, Dept Ingeneria Sistemas Telemat, Madrid 28040, Spain
基金
中国国家自然科学基金;
关键词
Reliability; Field programmable gate arrays; Parallel processing; Convolutional neural networks; Reliability engineering; Computer architecture; Neural networks; Convolutional neural networks (CNNs); FPGA accelerator; reliability; single bit upsets (SEUs); fault injection; RADIATION;
D O I
10.1109/TDMR.2023.3235767
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Convolutional neural networks (CNN) have become essential for many scientific and industrial applications, such as image classification and pattern detection. Among the devices that can implement neural networks, SRAM based FPGAs are a popular option due to their excellent parallel computing capability and good flexibility. However, SRAM-FPGAs are susceptible to radiation effects, which limits its application on safety critical applications. In this paper, the reliability of an accelerator based on the advanced Instruction-Set Architecture is evaluated based on hardware fault injection experiments. Each main module of the accelerator is evaluated separately, and the impact of parallelism and model features on the accelerator reliability is also examined. The experimental results reveal some important conclusions in terms of general hardware reliability and also of the particular model reliability. First, over 99% of SEUs on the computation modules will cause accuracy loss, and the reliability improves for higher parallelism. Second, a large portion of SEUs on the data mover and the instruction scheduler will cause system corruptions due to abnormal interactions with the ARM or other modules. Third, nonlinear activation and pooling layers are effective in reducing the effect of SEUs on computation modules, so models that use these layers tend to be more robust. The results provide a deep understanding of the impact of errors on CNNs implemented on ISA based FPGA accelerators (e.g., the Xilinx DPU).
引用
收藏
页码:116 / 126
页数:11
相关论文
共 36 条
[1]  
[Anonymous], 2018, 7 Series FPGAs configuration user guide: UG470
[2]  
[Anonymous], 2018, Vivado Design Suite User Guide: Partial Reconfiguration
[3]  
[Anonymous], 2020, PG375 V1 0 DYN FUNCT
[4]  
[Anonymous], 2020, ZYNQ DPU V3 2 PROD G
[5]  
Arechiga AP, 2018, IEEE HIGH PERF EXTR
[6]  
Bosio A, 2019, 2019 20TH IEEE LATIN AMERICAN TEST SYMPOSIUM (LATS), DOI 10.1109/latw.2019.8704548
[7]   Basic mechanisms and modeling of single-event upset in digital microelectronics [J].
Dodd, PE ;
Massengill, LW .
IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2003, 50 (03) :583-602
[8]  
Fabio B., 2018, P IEEE 19 LAT AM TES, P1
[9]   Reliability evaluation of FPGA based pruned neural networks [J].
Gao, Zhen ;
Yao, Yi ;
Wei, Xiaohui ;
Yan, Tong ;
Zeng, Shulin ;
Ge, Guangjun ;
Wang, Yu ;
Ullah, Anees ;
Reviriego, Pedro .
MICROELECTRONICS RELIABILITY, 2022, 130
[10]   Soft Error Tolerant Convolutional Neural Networks on FPGAs With Ensemble Learning [J].
Gao, Zhen ;
Zhang, Han ;
Yao, Yi ;
Xiao, Jiajun ;
Zeng, Shulin ;
Ge, Guangjun ;
Wang, Yu ;
Ullah, Anees ;
Reviriego, Pedro .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2022, 30 (03) :291-302