harDNNing: a machine-learning-based framework for fault tolerance assessment and protection of DNNs

被引:5
作者
Traiola, Marcello [1 ]
Kritikakou, Angeliki [1 ]
Sentieys, Olivier [1 ]
机构
[1] Univ Rennes, CNRS, INRIA, IRISA, Rennes, France
来源
2023 IEEE EUROPEAN TEST SYMPOSIUM, ETS | 2023年
关键词
Reliability Analysis; Fault Tolerance; Machine Learning; Neural Networks;
D O I
10.1109/ETS56758.2023.10174178
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Neural Networks (DNNs) show promising performance in several application domains, such as robotics, aerospace, smart healthcare, and autonomous driving. Never-theless, DNN results may be incorrect, not only because of the network intrinsic inaccuracy, but also due to faults affecting the hardware. Indeed, hardware faults may impact the DNN inference process and lead to prediction failures. Therefore, ensuring the fault tolerance of DNN is crucial. However, common fault tolerance approaches are not cost-effective for DNNs protection, because of the prohibitive overheads due to the large size of DNNs and of the required memory for parameter storage. In this work, we propose a comprehensive framework to assess the fault tolerance of DNNs and cost-effectively protect them. As a first step, the proposed framework performs datatype-and-layer-based fault injection, driven by the DNN characteristics. As a second step, it uses classification-based machine learning methods in order to predict the criticality, not only of network parameters, but also of their bits. Last, dedicated Error Correction Codes (ECCs) are selectively inserted to protect the critical parameters and bits, hence protecting the DNNs with low cost. Thanks to the proposed framework, we explored and protected two Convolutional Neural Networks (CNNs), each with four different data encoding. The results show that it is possible to protect the critical network parameters with selective ECCs while saving up to 83% memory w.r.t. conventional ECC approaches.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] A machine-learning-guided framework for fault-tolerant DNNs
    Traiola, Marcello
    Kritikakou, Angeliki
    Sentieys, Olivier
    2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
  • [2] Fault Detection of Induction Motors with Combined Modeling- and Machine-Learning-Based Framework
    Benninger, Moritz
    Liebschner, Marcus
    Kreischer, Christian
    ENERGIES, 2023, 16 (08)
  • [3] A Machine-Learning-Based Framework for Productive Locality Exploitation
    Kayraklioglu, Engin
    Favry, Erwan
    El-Ghazawi, Tarek
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (06) : 1409 - 1424
  • [4] Optimising Machine-Learning-Based Fault Prediction in Foundry Production
    Santos, Igor
    Nieves, Javier
    Penya, Yoseba K.
    Bringas, Pablo G.
    DISTRIBUTED COMPUTING, ARTIFICIAL INTELLIGENCE, BIOINFORMATICS, SOFT COMPUTING, AND AMBIENT ASSISTED LIVING, PT II, PROCEEDINGS, 2009, 5518 : 554 - 561
  • [5] Machine-Learning-Based Condition Assessment of Gas Turbines-A Review
    de Castro-Cros, Marti
    Velasco, Manel
    Angulo, Cecilio
    ENERGIES, 2021, 14 (24)
  • [6] A Machine-Learning-Based Framework for Supporting Malware Detection and Analysis
    Cuzzocrea, Alfredo
    Mercaldo, Francesco
    Martinelli, Fabio
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2021, PT III, 2021, 12951 : 353 - 365
  • [7] Practical Performance of a Distributed Processing Framework for Machine-Learning-based NIDS
    Kajiura, Maho
    Nakamura, Junya
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 2215 - 2221
  • [8] A Machine-Learning-Based Fault Diagnosis Method With Adaptive Secondary Sampling for Multiphase Drive Systems
    Liu, Zicheng
    Fang, Lanlan
    Jiang, Dong
    Qu, Ronghai
    IEEE TRANSACTIONS ON POWER ELECTRONICS, 2022, 37 (08) : 8767 - 8772
  • [9] On misbehaviour and fault tolerance in machine learning systems
    Myllyaho, Lalli
    Raatikainen, Mikko
    Mannisto, Tomi
    Nurminen, Jukka K.
    Mikkonen, Tommi
    JOURNAL OF SYSTEMS AND SOFTWARE, 2022, 183
  • [10] Fault Tolerance of Cloud Infrastructure with Machine Learning
    Kalaskar, Chetankumar
    Thangam, S.
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2023, 23 (04) : 26 - 50