Reliability Exploration of System-on-Chip With Multi-Bit-Width Accelerator for Multi-Precision Deep Neural Networks

被引：8

作者：

Cheng, Quan ^{[1
]}

Huang, Mingqiang ^{[2
]}

Man, Changhai ^{[3
]}

Shen, Ao

Dai, Liuyao ^{[4
]}

Yu, Hao ^{[5
]}

Hashimoto, Masanori ^{[1
,6
]}

机构：

[1] Kyoto Univ, Dept Commun & Comp Engn, Kyoto 6068501, Japan

[2] Southern Univ Sci & Technol, Sch Microelect, Shenzhen 518055, Peoples R China

[3] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA

[4] Univ Calif Merced, Dept Elect Engn & Comp Sci, Merced, CA USA

[5] Southern Univ Sci & Technol, Sch Microelect, Shenzhen, Peoples R China

[6] Kyoto Univ, Dept Commun & Comp Engn, Kyoto, Japan

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS | 2023年 / 70卷 / 10期

基金：

日本学术振兴会;

关键词：

Index Terms- Multi-bit-width; CNN; accelerator; NAS; relia-bility; FPGA; SoC; MIXED PRECISION; IMPACT; MULTIPLICATION; RADIATION; PROCESSOR; MATRIX; CORES; EDGE; SOC;

D O I：

10.1109/TCSI.2023.3300899

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Deep neural networks (DNNs) in safety-critical applications demand high reliability even when running on edge-computing devices. Recent works on System-on-Chip (SoC) design with state-of-the-art (SOTA) hardware artificial intelligence (AI) accelerators and corresponding multi-bit-width (MBW) convolutional neural network (CNN) generation strategies show that MBW CNNs can effectively explore the trade-off between network accuracy and hardware efficiency. However, reliability has not been considered in such trade-off analysis, even though highly quantized CNNs may elevate the impact of bit flips in the hardware. Also, the reliability of the microcontroller and its interface operating with the AI accelerator are not studied. This work evaluates the reliability of DNN computation in an SoC that includes a processor, SOTA AI accelerator, and NN models highly optimized for computation efficiency using a neural architecture search (NAS) method. Focusing on neutron-induced soft error, which is the primary source of bit-flip errors in a terrestrial environment, we perform fault injection and neutron beam experiments. For these experiments, we prototype the SoC on a flash-based FPGA platform, in which the configuration memory is robust to neutron irradiation. Then, we analyze the experimental data and identify vulnerable components in the system. Furthermore, we evaluate how the SoC running different NAS-optimized MBW LeNet5 networks impact the performance, radiation sensitivity, failure rate of MBW accelerator, and crash rate of the system on the FPGAs. Our results show that instruction and data tightly coupled memory (I/DTCM) are the most vulnerable parts and the control status registers (CSRs) in our accelerator are the second most vulnerable component. Moreover, MBW networks have higher susceptibility to critical errors than single-precision networks, low-precision data are more likely to affect the classification results, and the high bits are more sensitive to faults.

引用

页码：3978 / 3991

页数：14

共 51 条

[1] Applying Lightweight Soft Error Mitigation Techniques to Embedded Mixed Precision Deep Neural Networks
Abich, Geancarlo
Gava, Jonas
Garibotti, Rafael
Reis, Ricardo
Ost, Luciano
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2021, 68 (11) : 4772 - 4782
[2] Alemdar H, 2017, IEEE IJCNN, P2547, DOI 10.1109/IJCNN.2017.7966166
[3] APONTEMORENO A, 2021, PROC IEEE 22 LATIN A, P1
[4] MEISSA: Multiplying Matrices Efficiently in a Scalable Systolic Architecture
Asgari, Bahar
Hadidi, Ramyad
Kim, Hyesoon
[J]. 2020 IEEE 38TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2020), 2020, : 130 - 137
[5] Impact of Tensor Cores and Mixed Precision on the Reliability of Matrix Multiplication in GPUs
Basso, Pedro Martins
dos Santos, Fernando Fernandes
Rech, Paolo
[J]. IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2020, 67 (07) : 1560 - 1565
[6] Evaluating and Mitigating Neutrons Effects on COTS EdgeAI Accelerators
Blower, Sebastian
Rech, Paolo
Cazzaniga, Carlo
Kastriotou, Maria
Frost, Christopher D.
[J]. IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2021, 68 (08) : 1719 - 1726
[7] Cai Han, 2018, ARXIV
[8] Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing
Camusy, Vincent
Meiy, Linyan
Enz, Christian
Verhelst, Marian
[J]. IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (04) : 697 - 711
[9] Xception: A technique for the experimental evaluation of dependability in modern computers
Carreira, J
Madeira, H
Silva, JG
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1998, 24 (02) : 125 - 136
[10] Chakraborty I., 2020, NATURE MACH INTELL, V2, P113

← 1 2 3 4 5 6 →