Efficient Fault-Criticality Analysis for AI Accelerators using a Neural Twin

被引:12
作者
Chaudhuri, Arjun [1 ]
Chen, Ching-Yuan [1 ]
Talukdar, Jonti [1 ]
Madala, Siddarth [2 ,3 ]
Dubey, Abhishek Kumar [4 ]
Chakrabarty, Krishnendu [1 ]
机构
[1] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27708 USA
[2] Duke Univ, Dept Stat, Durham, NC USA
[3] Duke Univ, Dept Comp Sci, Durham, NC 27706 USA
[4] NCI, Ctr Canc Res, Bethesda, MD 20892 USA
来源
2021 IEEE INTERNATIONAL TEST CONFERENCE (ITC 2021) | 2021年
关键词
D O I
10.1109/ITC50571.2021.00015
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Owing to the inherent fault tolerance of deep neural network (DNN) models used for classification, many structural faults in the processing elements (PEs) of a systolic array-based AI accelerator are functionally benign. Brute-force fault simulation for determining fault criticality is computationally expensive due to many potential fault sites in the accelerator array and the dependence of criticality characterization of PEs on the functional input data. Supervised learning techniques can be used to accurately estimate fault criticality but it requires ground truth for model training. The ground-truth collection involves extensive and computationally expensive fault simulations. We present a framework for analyzing fault criticality with a negligible amount of ground-truth data. We incorporate the gate-level structural and functional information of the PEs in their "neural twins", referred to as "PE-Nets". The PE netlist is translated into a trainable PE-Net, where the standard-cell instances are substituted by their corresponding "Cell-Nets" and the wires translate to neural connections. Each Cell-Net is a pre-trained DNN that models the Boolean-logic behavior of the corresponding standard cell. In the PE-Net, every neural connection is associated with a bias that represents a perturbation in the signal propagated by that connection. We utilize a recently proposed misclassification-driven training algorithm to sensitize and identify biases that are critical to the functioning of the accelerator for a given application workload. The proposed framework achieves up to 100% accuracy in fault-criticality classification in 16-bit and 32-bit PEs by using the criticality knowledge of only 2% of the total faults in a PE.
引用
收藏
页码:73 / 82
页数:10
相关论文
共 21 条
  • [1] Chaudhuri A., DATE, P2021
  • [2] Chaudhuri A., 2020, ATS
  • [3] Chaudhuri A., 2020, ITC
  • [4] Chen C., 2021, DATE
  • [5] Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
    Chen, Yu-Hsin
    Krishna, Tushar
    Emer, Joel S.
    Sze, Vivienne
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) : 127 - 138
  • [6] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [7] Gebregiorgis A., 2019, ITC
  • [8] Glantz Stanton, 2001, Primer of applied regression & analysis of variance
  • [9] Guo X., 2012, ISRN Bioinformatics, V2012
  • [10] In-Datacenter Performance Analysis of a Tensor Processing Unit
    Jouppi, Norman P.
    Young, Cliff
    Patil, Nishant
    Patterson, David
    Agrawal, Gaurav
    Bajwa, Raminder
    Bates, Sarah
    Bhatia, Suresh
    Boden, Nan
    Borchers, Al
    Boyle, Rick
    Cantin, Pierre-luc
    Chao, Clifford
    Clark, Chris
    Coriell, Jeremy
    Daley, Mike
    Dau, Matt
    Dean, Jeffrey
    Gelb, Ben
    Ghaemmaghami, Tara Vazir
    Gottipati, Rajendra
    Gulland, William
    Hagmann, Robert
    Ho, C. Richard
    Hogberg, Doug
    Hu, John
    Hundt, Robert
    Hurt, Dan
    Ibarz, Julian
    Jaffey, Aaron
    Jaworski, Alek
    Kaplan, Alexander
    Khaitan, Harshit
    Killebrew, Daniel
    Koch, Andy
    Kumar, Naveen
    Lacy, Steve
    Laudon, James
    Law, James
    Le, Diemthu
    Leary, Chris
    Liu, Zhuyuan
    Lucke, Kyle
    Lundin, Alan
    MacKean, Gordon
    Maggiore, Adriana
    Mahony, Maire
    Miller, Kieran
    Nagarajan, Rahul
    Narayanaswami, Ravi
    [J]. 44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, : 1 - 12