BinFI: An Efficient Fault Injector for Safety-Critical Machine Learning Systems

被引:52
作者
Chen, Zitao [1 ]
Li, Guanpeng [1 ]
Pattabiraman, Karthik [1 ]
DeBardeleben, Nathan [2 ]
机构
[1] Univ British Columbia, Vancouver, BC, Canada
[2] Los Alamos Natl Lab, Los Alamos, NM USA
来源
PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS | 2019年
基金
加拿大自然科学与工程研究理事会;
关键词
Error Resilience; Machine Learning; Fault Injection; NEURAL-NETWORKS;
D O I
10.1145/3295500.3356177
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
As machine learning (ML) becomes pervasive in high performance computing, ML has found its way into safety-critical domains (e.g., autonomous vehicles). Thus the reliability of ML has grown in importance. Specifically, failures of ML systems can have catastrophic consequences, and can occur due to soft errors, which are increasing in frequency due to system scaling. Therefore, we need to evaluate ML systems in the presence of soft errors. In this work, we propose Biel, an efficient fault injector (FI) for finding the safety-critical bits in ML applications. We find the widely-used ML computations are often monotonic. Thus we can approximate the error propagation behavior of a ML application as a monotonic function. BinFI uses a binary-search like FI technique to pinpoint the safety-critical bits (also measure the overall resilience). BinFI identifies 99.56% of safety-critical bits (with 99.63% precision) in the systems, which significantly outperforms random FI, with much lower costs.
引用
收藏
页数:23
相关论文
共 67 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] [Anonymous], 2016, CoRR abs/1512.00567, DOI DOI 10.1109/CVPR.2016.308
  • [3] [Anonymous], 2016, 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC), DOI 10.1109/DASC.2016.7778091
  • [4] Ashraf R. A., 2015, SC15, P1
  • [5] Hands Off the Wheel in Autonomous Vehicles? A Systems Perspective on over a Million Miles of Field Data
    Banerjee, Subho S.
    Jha, Saurabh
    Cyriac, James
    Kalbarczyk, Zbigniew T.
    Iyer, Ravishankar K.
    [J]. 2018 48TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN), 2018, : 586 - 597
  • [6] Bojarski Mariusz, 2016, arXiv
  • [7] Study of Bc → J/ψV and B*c → ηcV decays within the QCD factorization
    Chang, Qin
    Chen, Li-Li
    Xu, Shuai
    [J]. JOURNAL OF PHYSICS G-NUCLEAR AND PARTICLE PHYSICS, 2018, 45 (07)
  • [8] Cong G, 2018, TECHNICAL REPORT
  • [9] Courbariaux M., 2014, ARXIV14127024
  • [10] DeBardeleben N., 2009, HIGH END COMPUTING R