Soft Error Tolerant Convolutional Neural Networks on FPGAs With Ensemble Learning

被引:18
作者
Gao, Zhen [1 ]
Zhang, Han [2 ]
Yao, Yi [1 ]
Xiao, Jiajun [1 ]
Zeng, Shulin [3 ]
Ge, Guangjun [3 ]
Wang, Yu [3 ]
Ullah, Anees [4 ]
Reviriego, Pedro [5 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Tianjin Int Engn Inst, Tianjin 300072, Peoples R China
[3] Tsinghua Univ, Sch Elect Engn, Beijing 100084, Peoples R China
[4] Univ Engn & Technol, Dept Elect Engn, Peshawar 220101, Abbottabad, Pakistan
[5] Univ Carlos III Madrid, Dept Telemat Engn, Leganes 28911, Spain
基金
中国国家自然科学基金;
关键词
Convolutional neural networks; Reliability; Field programmable gate arrays; Random access memory; Neural networks; Fault tolerant systems; Convolution; Convolutional neural networks (CNNs); ensemble; fault injection; field-programmable gate array (FPGA) accelerator; soft error tolerance; FAULT; RELIABILITY; CNN; ACCELERATORS; RADIATION;
D O I
10.1109/TVLSI.2021.3138491
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolutional neural networks (CNNs) are widely used in computer vision and natural language processing. Field-programmable gate arrays (FPGAs) are popular accelerators for CNNs. However, if used in critical applications, the reliability of FPGA-based CNNs becomes a priority because FPGAs are prone to suffer soft errors. Traditional protection schemes, such as triple modular redundancy (TMR), introduce a large overhead, which is not acceptable in resource-limited platforms. This article proposes to use an ensemble of weak CNNs to build a robust classifier with low cost. To have a group of base CNNs with low complexity and balanced similarity and diversity, residual neural networks (ResNets) with different layers (20/32/44/56) are combined in the ensemble system to replace a single strong ResNet 110. In addition, a robust combiner is designed based on the reliability evaluation of a single ResNet. Single ResNets with different layers and different ensemble schemes are implemented on the FPGA accelerator based on Xilinx Zynq 7000 SoC. The reliability of the ensemble systems is evaluated based on a large-scale fault injection platform and compared with that of the TMR-protected ResNet 110 and ResNet 20. Experiment results show that the proposed ensembles could effectively improve the system reliability when suffering soft errors with an overhead much lower than TMR.
引用
收藏
页码:291 / 302
页数:12
相关论文
共 50 条
[1]  
[Anonymous], 2017, 2017 IEEE AEROSPACE
[2]  
[Anonymous], 2015, ICLR
[3]  
[Anonymous], 2020, ZYNQ DPU V3 2 PROD G
[4]  
Arechiga AP, 2018, 2018 IEEE 8TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), P190, DOI 10.1109/CCWC.2018.8301749
[5]  
Benevenuti F., 2018, P IEEE 19 LAT AM TES, P1
[6]  
Caffrey M., 2002, P MIL AER APPL PROGR, P1
[7]  
Carmichael C, 2006, XAPP197 XIL
[8]   Robust Face Recognition via Multimodal Deep Face Representation [J].
Ding, Changxing ;
Tao, Dacheng .
IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) :2049-2058
[9]   Evaluation and Mitigation of Soft-Errors in Neural Network-based Object Detection in Three GPU Architectures [J].
dos Santos, Fernando Fernandes ;
Draghetti, Lucas ;
Weigel, Lucas ;
Carro, Luigi ;
Navaux, Philippe ;
Rech, Paolo .
2017 47TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOPS (DSN-W 2017), 2017, :169-176
[10]   An Ensemble CNN2ELM for Age Estimation [J].
Duan, Mingxing ;
Li, Kenli ;
Li, Keqin .
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2018, 13 (03) :758-772