Training-Free Stuck-At Fault Mitigation for ReRAM-Based Deep Learning Accelerators

被引:4
作者
Quan, Chenghao [1 ]
Fouda, Mohammed E. [2 ]
Lee, Sugil [1 ]
Jung, Giju [1 ]
Lee, Jongeun [1 ]
Eltawil, Ahmed E. [3 ]
Kurdahi, Fadi [2 ]
机构
[1] Ulsan Natl Inst Sci & Technol, Dept Elect Engn, Ulsan 44919, South Korea
[2] Univ Calif Irvine, Ctr Embedded & Cyber Phys Syst, Irvine, CA 92697 USA
[3] King Abdullah Univ Sci & Technol, CEMSE Div, Thuwal 23955, Saudi Arabia
关键词
Accelerator; artificial neural network; batch normalization (BN); ReRAM crossbar array; stuck-at fault (SAF); IR-DROP; EFFICIENT;
D O I
10.1109/TCAD.2022.3222288
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Although Resistive RAMs can support highly efficient matrix-vector multiplication, which is very useful for machine learning and other applications, the nonideal behavior of hardware, such as stuck-at fault (SAF) and IR drop is an important concern in making ReRAM crossbar array-based deep learning accelerators. Previous work has addressed the nonideality problem through either redundancy in hardware, which requires a permanent increase of hardware cost, or software retraining, which may be even more costly or unacceptable due to its need for a training dataset as well as high computation overhead. In this article, we propose a very lightweight method that can be applied on top of existing hardware or software solutions. Our method, called forward-parameter tuning (FPT), takes advantage of a certain statistical property existing in the activation data of neural network layers, and can mitigate the impact of mild nonidealities in ReRAM crossbar arrays (RCAs) for deep learning applications without using any hardware, a dataset, or gradient-based training. Our experimental results using MNIST, CIFAR-10, and CIFAR-100, and ImageNet datasets in binary and multibit networks demonstrate that our technique is very effective, both alone and together with previous methods, up to 20% fault rate, which is higher than even some of the previous remapping methods. We also evaluate our method in the presence of other nonidealities, such as variability and IR drop. Furthermore, we provide an analysis based on the concept of the effective fault rate (EFR), which not only demonstrates that EFR can be a useful tool to predict the accuracy of faulty RCA-based neural networks but also explains why mitigating the SAF problem is more difficult with multibit neural networks.
引用
收藏
页码:2174 / 2186
页数:13
相关论文
共 38 条
[11]   RRAM-Based Neuromorphic Hardware Reliability Improvement by Self-Healing and Error Correction [J].
Hu, Jia-Yun ;
Hou, Kuan-Wei ;
Lo, Chih-Yen ;
Chou, Yung-Fa ;
Wu, Cheng-Wen .
2018 IEEE INTERNATIONAL TEST CONFERENCE IN ASIA (ITC-ASIA 2018), 2018, :19-24
[12]   Memristor-Based Analog Computation and Neural Network Classification with a Dot Product Engine [J].
Hu, Miao ;
Graves, Catherine E. ;
Li, Can ;
Li, Yunning ;
Ge, Ning ;
Montgomery, Eric ;
Davila, Noraica ;
Jiang, Hao ;
Williams, R. Stanley ;
Yang, J. Joshua ;
Xia, Qiangfei ;
Strachan, John Paul .
ADVANCED MATERIALS, 2018, 30 (09)
[13]   Efficient and Optimized Methods for Alleviating the Impacts of IR-Drop and Fault in RRAM Based Neural Computing Systems [J].
Huang, Chenglong ;
Xu, Nuo ;
Qiu, Keni ;
Zhu, Yujie ;
Ma, Desheng ;
Fang, Liang .
IEEE JOURNAL OF THE ELECTRON DEVICES SOCIETY, 2021, 9 :645-652
[14]  
Jung G, 2021, PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), P1733
[15]   Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss [J].
Jung, Sangil ;
Son, Changyong ;
Lee, Seohyung ;
Son, Jinwoo ;
Han, Jae-Joon ;
Kwak, Youngjun ;
Hwang, Sung Ju ;
Choi, Changkyu .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4345-4354
[16]  
Kim Jaehyun, 2019, I SYMPOS LOW POWER E, DOI DOI 10.1109/islped.2019.8824902
[17]  
Lee S., 2022, IEEE T COMPUT A 0523, DOI [10.1109/TCAD.2022.3177002, DOI 10.1109/TCAD.2022.3177002]
[18]   Fast and Low-Cost Mitigation of ReRAM Variability for Deep Learning Applications [J].
Lee, Sugil ;
Fouda, Mohammed ;
Lee, Jongeun ;
Eltawil, Ahmed ;
Kurdahi, Fadi .
2021 IEEE 39TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2021), 2021, :269-276
[19]   Learning to Predict IR Drop with Effective Training for ReRAM-based Neural Network Hardware [J].
Lee, Sugil ;
Jung, Giju ;
Fouda, Mohammed E. ;
Lee, Jongeun ;
Eltawil, Ahmed ;
Kurdahi, Fadi .
PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
[20]  
Liu C., 2017, PROC 54 ANN AUTOM C, P1