Resilience of deep learning applications: A systematic literature review of analysis and hardening techniques

被引:0
作者
Bolchini, Cristiana [1 ]
Cassano, Luca [1 ]
Miele, Antonio [1 ]
机构
[1] Politecn Milan, Dip Elettron Informaz & Bioingn, Pzza L Da Vinci 32, I-20133 Milan, Italy
关键词
Convolutional Neural Network; Deep Learning; Deep Neural Network; Fault tolerance; Resilience analysis; Hardening; Hardware faults; CONVOLUTIONAL NEURAL-NETWORKS; FAULT-TOLERANCE; PRECISION; RELIABILITY; IMPACT;
D O I
10.1016/j.cosrev.2024.100682
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Machine Learning (ML) is currently being exploited in numerous applications, being one of the most effective Artificial Intelligence (AI) technologies used in diverse fields, such as vision, autonomous systems, and the like. The trend motivated a significant amount of contributions to the analysis and design of ML applications against faults affecting the underlying hardware. The authors investigate the existing body of knowledge on Deep Learning (among ML techniques) resilience against hardware faults systematically through a thoughtful review in which the strengths and weaknesses of this literature stream are presented clearly and then future avenues of research are set out. The review reports 85 scientific articles published between January 2019 and March 2024, after carefully analysing 222 contributions (from an initial screening of eligible 244 publications). The authors adopt a classifying framework to interpret and highlight research similarities and peculiarities, based on several parameters, starting from the main scope of the work, the adopted fault and error models, to their reproducibility. This framework allows for a comparison of the different solutions and the identification of possible synergies. Furthermore, suggestions concerning the future direction of research are proposed in the form of open challenges to be addressed.
引用
收藏
页数:21
相关论文
共 108 条
[21]   BinFI: An Efficient Fault Injector for Safety-Critical Machine Learning Systems [J].
Chen, Zitao ;
Li, Guanpeng ;
Pattabiraman, Karthik ;
DeBardeleben, Nathan .
PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2019,
[22]   Cross-Layer Resilience: Challenges, Insights, and the Road Ahead [J].
Cheng, Eric ;
Daniel-Mueller-Gritschneder ;
Abraham, Jacob ;
Bose, Pradip ;
Buyuktosunoglu, Alper ;
Chen, Deming ;
Cho, Hyungmin ;
Li, Yanjing ;
Sharif, Uzair ;
Skadron, Kevin ;
Stan, Mircea ;
Schlichtmann, Ulf ;
Mitra, Subhasish .
PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
[23]   A Multi-level Approach to Evaluate the Impact of GPU Permanent Faults on CNN's Reliability [J].
Condia, Josie E. Rodriguez ;
Guerrero-Balaguera, Juan-David ;
Dos Santos, Fernando F. ;
Reorda, Matteo Sonza ;
Rech, Paolo .
2022 IEEE INTERNATIONAL TEST CONFERENCE (ITC), 2022, :278-287
[24]   FlexGripPlus: An improved GPGPU model to support reliability analysis [J].
Condia, Josie E. Rodriguez ;
Du, Boyang ;
Reorda, Matteo Sonza ;
Sterpone, Luca .
MICROELECTRONICS RELIABILITY, 2020, 109
[25]  
DarkNet, ABOUT US
[26]   FireNN: Neural Networks Reliability Evaluation on Hybrid Platforms [J].
De Sio, Corrado ;
Azimi, Sarah ;
Sterpone, Luca L. .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2022, 10 (02) :549-563
[27]   OR-ML: Enhancing Reliability for Machine Learning Accelerator with Opportunistic Redundancy [J].
Dong, Bo ;
Wang, Zheng ;
Chen, Wenxuan ;
Chen, Chao ;
Yang, Yongkui ;
Yu, Zhibin .
PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, :739-742
[28]   Analyzing and Increasing the Reliability of Convolutional Neural Networks on GPUs [J].
dos Santos, Fernando Fernandes ;
Pimenta, Pedro Foletto ;
Lunardi, Caio ;
Draghetti, Lucas ;
Carro, Luigi ;
Kaeli, David ;
Rech, Paolo .
IEEE TRANSACTIONS ON RELIABILITY, 2019, 68 (02) :663-677
[29]   Systematic Reliability Evaluation of FPGA Implemented CNN Accelerators [J].
Gao, Zhen ;
Gao, Shihui ;
Yao, Yi ;
Liu, Qiang ;
Zeng, Shulin ;
Ge, Guangjun ;
Wang, Yu ;
Ullah, Anees ;
Reviriego, Pedro .
IEEE TRANSACTIONS ON DEVICE AND MATERIALS RELIABILITY, 2023, 23 (01) :116-126
[30]   Soft Error Tolerant Convolutional Neural Networks on FPGAs With Ensemble Learning [J].
Gao, Zhen ;
Zhang, Han ;
Yao, Yi ;
Xiao, Jiajun ;
Zeng, Shulin ;
Ge, Guangjun ;
Wang, Yu ;
Ullah, Anees ;
Reviriego, Pedro .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2022, 30 (03) :291-302