DiagNNose: Toward Error Localization in Deep Learning Hardware-Based on VTA-TVM Stack

被引:0
|
作者
Kundu, Shamik [1 ]
Banerjee, Suvadeep [2 ]
Raha, Arnab [3 ]
Natarajan, Suriyaprakash [4 ]
Basu, Kanad [1 ]
机构
[1] Univ Texas Dallas, Dept Elect & Comp Engn, Richardson, TX 75080 USA
[2] Intel Corp, Strateg CAD Labs, Santa Clara, CA 95054 USA
[3] Intel Corp, Adv Architecture Res Grp, Santa Clara, CA 95054 USA
[4] Intel Corp, Mfg & Prod Engn, Santa Clara, CA 95054 USA
关键词
Circuit faults; Location awareness; Life estimation; Computational modeling; Tensors; Feature extraction; Deep learning; Deep learning (DL); fault diagnosis; functional safety (FuSa); versatile tensor accelerator (VTA);
D O I
10.1109/TCAD.2023.3303851
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Low-level hardware faults manifested in a Deep learning (DL) accelerator usher in graceless degradation of high-level classification accuracy, which can eventuate to catastrophic circumstances. This violates the crucial Functional Safety (FuSa) of the DL accelerator, maintaining which is imperative in high-assurance applications. Conventional techniques for error localization incur high-test efforts, without regards to the unique challenges posed by DL systems. In this direction, we propose DiagNNose, a two-tier machine learning-based error localization framework for on-line fault management in DL accelerators. We develop a novel diagnostic pattern selection algorithm to obtain a minimal subset of functional test patterns, that are executed in the accelerator in mission mode. By extracting and analyzing dataflow-based features from the intermediate computations of the general matrix multiply (GEMM) core, a lightweight multilayer perceptron accomplishes bit-level error localization in 8-bit, 16-bit, and 32-bit datapath units with high fidelity. We have limited ourselves to a single accelerator design, i.e., the versatile tensor accelerator (VTA) architecture to evaluate our proposed DiagNNose framework. On executing state-of-the-art deep neural networks trained on ImageNet; error localization using only 30 diagnostic functional test patterns demonstrate up to 98.4% diagnosability, thereby demonstrating an improvement of 54.63% over a random test pattern set, with as low as 4.95% overhead in the DL accelerator in mission mode.
引用
收藏
页码:217 / 229
页数:13
相关论文
共 11 条
  • [1] Influence of unique behaviors in an atomic switch operation on hardware-based deep learning
    Tomatsuri, Keita
    Hasegawa, Tsuyoshi
    JAPANESE JOURNAL OF APPLIED PHYSICS, 2024, 63 (03)
  • [2] A high-performance, hardware-based deep learning system for disease diagnosis
    Siddique, Ali
    Iqbal, Muhammad Azhar
    Aleem, Muhammad
    Lin, Jerry Chun-Wei
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [3] A high-performance, hardware-based deep learning system for disease diagnosis
    Siddique A.
    Iqbal M.A.
    Aleem M.
    Lin J.C.-W.
    PeerJ Computer Science, 2022, 8
  • [4] Deep Neural Network and Transfer Learning for Accurate Hardware-Based Zero-Day Malware Detection
    He, Zhangying
    Rezaei, Amin
    Homayoun, Houman
    Sayadi, Hossein
    PROCEEDINGS OF THE 32ND GREAT LAKES SYMPOSIUM ON VLSI 2022, GLSVLSI 2022, 2022, : 27 - 32
  • [5] Toward Functional Safety of Systolic Array-Based Deep Learning Hardware Accelerators
    Kundu, Shamik
    Banerjee, Suvadeep
    Raha, Arnab
    Natarajan, Suriyaprakash
    Basu, Kanad
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2021, 29 (03) : 485 - 498
  • [6] Inspection robot GPS outages localization based on error Kalman filter and deep learning
    Li, Yansheng
    Yu, Haoyang
    Xiao, Lingli
    Yuan, Yiyang
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2025, 183
  • [7] Wheel Odometry with Deep Learning-Based Error Prediction Model for Vehicle Localization
    He, Ke
    Ding, Haitao
    Xu, Nan
    Guo, Konghui
    APPLIED SCIENCES-BASEL, 2023, 13 (09):
  • [8] Error Tolerance Analysis of Deep Learning Hardware Using a Restricted Boltzmann Machine Toward Low-Power Memory Implementation
    Marukame, Takao
    Ueyoshi, Kodai
    Asai, Tetsuya
    Motomura, Masato
    Schmid, Alexandre
    Suzuki, Masamichi
    Higashi, Yusuke
    Mitani, Yuichiro
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2017, 64 (04) : 462 - 466
  • [9] Deep learning-based ranging error mitigation method for UWB localization system in greenhouse
    Niu, Ziang
    Yang, Huizhen
    Zhou, Lei
    Taha, Mohamed Farag
    He, Yong
    Qiu, Zhengjun
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2023, 205
  • [10] DeepTAL: Deep Learning for TDOA-Based Asynchronous Localization Security With Measurement Error and Missing Data
    Xue, Yuan
    Su, Wei
    Wang, Hongchao
    Yang, Dong
    Jiang, Yemeng
    IEEE ACCESS, 2019, 7 : 122492 - 122502