A machine-learning-guided framework for fault-tolerant DNNs

被引:4
作者
Traiola, Marcello [1 ]
Kritikakou, Angeliki [1 ]
Sentieys, Olivier [1 ]
机构
[1] Univ Rennes, INRIA, CNRS, IRISA, Rennes, France
来源
2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE | 2023年
关键词
Reliability Analysis; Fault Tolerance; Machine Learning; Neural Networks; ERROR;
D O I
10.23919/DATE56975.2023.10137033
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Neural Networks (DNNs) show promising performance in several application domains. Nevertheless, DNN results may be incorrect, not only because of the network intrinsic inaccuracy, but also due to faults affecting the hardware. Ensuring the fault tolerance of DNN is crucial, but common fault tolerance approaches are not cost-effective, due to the prohibitive overheads for large DNNs. This work proposes a comprehensive framework to assess the fault tolerance of DNN parameters and cost-effectively protect them. As a first step, the proposed framework performs a statistical fault injection. The results are used in the second step with classification-based machine learning methods to obtain a bit-accurate prediction of the criticality of all network parameters. Last, Error Correction Codes (ECCs) are selectively inserted to protect only the critical parameters, hence entailing low cost. Thanks to the proposed framework, we explored and protected two Convolutional Neural Networks (CNNs), each with four different data encoding. The results show that it is possible to protect the critical network parameters with selective ECCs while saving up to 79% memory w.r.t. conventional ECC approaches.
引用
收藏
页数:2
相关论文
共 11 条
  • [1] Cavagnero N., 2022, arXiv
  • [2] Analyzing and Increasing the Reliability of Convolutional Neural Networks on GPUs
    dos Santos, Fernando Fernandes
    Pimenta, Pedro Foletto
    Lunardi, Caio
    Draghetti, Lucas
    Carro, Luigi
    Kaeli, David
    Rech, Paolo
    [J]. IEEE TRANSACTIONS ON RELIABILITY, 2019, 68 (02) : 663 - 677
  • [3] Huang KP, 2020, IEEE INT SYMP INFO, P2694, DOI [10.1109/ISIT44484.2020.9174137, 10.1109/isit44484.2020.9174137]
  • [4] Jacob B, 2017, Arxiv, DOI arXiv:1712.05877
  • [5] Lee SS, 2022, DES AUT TEST EUROPE, P724, DOI 10.23919/DATE54114.2022.9774543
  • [6] Leveugle R, 2009, DES AUT TEST EUROPE, P502
  • [7] Ruospo A., 2021, 2021 IEEE 22 LATIN A, P1
  • [8] Investigating data representation for efficient and reliable Convolutional Neural Networks
    Ruospo, Annachiara
    Sanchez, Ernesto
    Traiola, Marcello
    O'Connor, Ian
    Bosio, Alberto
    [J]. MICROPROCESSORS AND MICROSYSTEMS, 2021, 86
  • [9] Fault and Error Tolerance in Neural Networks: A Review
    Torres-Huitzil, Cesar
    Girau, Bernard
    [J]. IEEE ACCESS, 2017, 5 : 17322 - 17341
  • [10] Zhang YC, 2022, DES AUT TEST EUROPE, P60, DOI 10.23919/DATE54114.2022.9774569