Reliability Exploration of System-on-Chip With Multi-Bit-Width Accelerator for Multi-Precision Deep Neural Networks

被引:8
作者
Cheng, Quan [1 ]
Huang, Mingqiang [2 ]
Man, Changhai [3 ]
Shen, Ao
Dai, Liuyao [4 ]
Yu, Hao [5 ]
Hashimoto, Masanori [1 ,6 ]
机构
[1] Kyoto Univ, Dept Commun & Comp Engn, Kyoto 6068501, Japan
[2] Southern Univ Sci & Technol, Sch Microelect, Shenzhen 518055, Peoples R China
[3] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
[4] Univ Calif Merced, Dept Elect Engn & Comp Sci, Merced, CA USA
[5] Southern Univ Sci & Technol, Sch Microelect, Shenzhen, Peoples R China
[6] Kyoto Univ, Dept Commun & Comp Engn, Kyoto, Japan
基金
日本学术振兴会;
关键词
Index Terms- Multi-bit-width; CNN; accelerator; NAS; relia-bility; FPGA; SoC; MIXED PRECISION; IMPACT; MULTIPLICATION; RADIATION; PROCESSOR; MATRIX; CORES; EDGE; SOC;
D O I
10.1109/TCSI.2023.3300899
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep neural networks (DNNs) in safety-critical applications demand high reliability even when running on edge-computing devices. Recent works on System-on-Chip (SoC) design with state-of-the-art (SOTA) hardware artificial intelligence (AI) accelerators and corresponding multi-bit-width (MBW) convolutional neural network (CNN) generation strategies show that MBW CNNs can effectively explore the trade-off between network accuracy and hardware efficiency. However, reliability has not been considered in such trade-off analysis, even though highly quantized CNNs may elevate the impact of bit flips in the hardware. Also, the reliability of the microcontroller and its interface operating with the AI accelerator are not studied. This work evaluates the reliability of DNN computation in an SoC that includes a processor, SOTA AI accelerator, and NN models highly optimized for computation efficiency using a neural architecture search (NAS) method. Focusing on neutron-induced soft error, which is the primary source of bit-flip errors in a terrestrial environment, we perform fault injection and neutron beam experiments. For these experiments, we prototype the SoC on a flash-based FPGA platform, in which the configuration memory is robust to neutron irradiation. Then, we analyze the experimental data and identify vulnerable components in the system. Furthermore, we evaluate how the SoC running different NAS-optimized MBW LeNet5 networks impact the performance, radiation sensitivity, failure rate of MBW accelerator, and crash rate of the system on the FPGAs. Our results show that instruction and data tightly coupled memory (I/DTCM) are the most vulnerable parts and the control status registers (CSRs) in our accelerator are the second most vulnerable component. Moreover, MBW networks have higher susceptibility to critical errors than single-precision networks, low-precision data are more likely to affect the classification results, and the high bits are more sensitive to faults.
引用
收藏
页码:3978 / 3991
页数:14
相关论文
共 51 条
  • [1] Applying Lightweight Soft Error Mitigation Techniques to Embedded Mixed Precision Deep Neural Networks
    Abich, Geancarlo
    Gava, Jonas
    Garibotti, Rafael
    Reis, Ricardo
    Ost, Luciano
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2021, 68 (11) : 4772 - 4782
  • [2] Alemdar H, 2017, IEEE IJCNN, P2547, DOI 10.1109/IJCNN.2017.7966166
  • [3] APONTEMORENO A, 2021, PROC IEEE 22 LATIN A, P1
  • [4] MEISSA: Multiplying Matrices Efficiently in a Scalable Systolic Architecture
    Asgari, Bahar
    Hadidi, Ramyad
    Kim, Hyesoon
    [J]. 2020 IEEE 38TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2020), 2020, : 130 - 137
  • [5] Impact of Tensor Cores and Mixed Precision on the Reliability of Matrix Multiplication in GPUs
    Basso, Pedro Martins
    dos Santos, Fernando Fernandes
    Rech, Paolo
    [J]. IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2020, 67 (07) : 1560 - 1565
  • [6] Evaluating and Mitigating Neutrons Effects on COTS EdgeAI Accelerators
    Blower, Sebastian
    Rech, Paolo
    Cazzaniga, Carlo
    Kastriotou, Maria
    Frost, Christopher D.
    [J]. IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2021, 68 (08) : 1719 - 1726
  • [7] Cai Han, 2018, ARXIV
  • [8] Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing
    Camusy, Vincent
    Meiy, Linyan
    Enz, Christian
    Verhelst, Marian
    [J]. IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (04) : 697 - 711
  • [9] Xception: A technique for the experimental evaluation of dependability in modern computers
    Carreira, J
    Madeira, H
    Silva, JG
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1998, 24 (02) : 125 - 136
  • [10] Chakraborty I., 2020, NATURE MACH INTELL, V2, P113