Reliability of Single-Error Correction Protected Memories

被引:18
作者
Antonio Maestro, Juan [1 ]
Reviriego, Pedro [1 ]
机构
[1] Univ Antonio Nebrija, Madrid 28040, Spain
关键词
Error correction codes; memory; reliability; single event upsets (SEU); EVENT UPSET; SOFT ERRORS; DESIGN;
D O I
10.1109/TR.2008.2006470
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Reliability is a critical factor for systems operating in radiation environments. Among the different components in a system, memories are one of the parts most sensitive to soft errors due to their relatively large area. Due to their large cost, traditional techniques like Triple Modular Redundancy are not used to protect memories. A typical approach is to apply Error Correction Codes to correct single errors, and detect double errors. This type of codes, for example those based on Hamming, provides an initial level of protection. Detected single errors are usually corrected using scrubbing, by which the memory positions are periodically re-written after a fixed (deterministic scrubbing), or variable period (probabilistic scrubbing). These traditional models usually offer good results when calculating the reliability of memories (e.g. through the Mean Time To Failure). However, there are some particularities that are not modeled through these approaches, to the best of our knowledge. One of these particularities is how double errors are handled. In a traditional approach, two errors in the same word produce always a system failure (only single errors can be corrected). However, if the two (or more) errors affect the same bit, either the second one reinforces the first one (keeping just a single error), or corrects it. In both scenarios, the resulting situation does not trigger a system failure, which has a direct impact on the reliability of the memory. In this paper, traditional reliability models are refined to handle the mentioned scenarios, which produces a more precise analysis in the calculation of mean time to failure for memory systems.
引用
收藏
页码:193 / 201
页数:9
相关论文
共 28 条
[1]   Soft errors in advanced computer systems [J].
Baumann, R .
IEEE DESIGN & TEST OF COMPUTERS, 2005, 22 (03) :258-266
[2]   THE RELIABILITY OF SINGLE-ERROR PROTECTED COMPUTER MEMORIES [J].
BLAUM, M ;
GOODMAN, R ;
MCELIECE, R .
IEEE TRANSACTIONS ON COMPUTERS, 1988, 37 (01) :114-119
[3]  
BROGNA AS, 2004, 2 IEEE INT WORKSH EL, P28
[4]   Design of a fault tolerant Solid State Mass Memory [J].
Cardarilli, GC ;
Leandri, A ;
Marinucci, P ;
Ottavi, M ;
Pontarelli, S ;
Re, M ;
Salsano, A .
IEEE TRANSACTIONS ON RELIABILITY, 2003, 52 (04) :476-491
[5]  
CHEN Y, 2003, INT REL WORKSH OCT 2, P91
[6]   Basic mechanisms and modeling of single-event upset in digital microelectronics [J].
Dodd, PE ;
Massengill, LW .
IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2003, 50 (03) :583-602
[7]  
FAURE F, 2005, THESIS I NATL POLYTE
[8]  
FELLER W, 2001, INTRO PROBABILITY TH
[9]   Flash memory under cosmic and alpha irradiation [J].
Fogle, AD ;
Darling, D ;
Blish, RC ;
Daszko, E .
IEEE TRANSACTIONS ON DEVICE AND MATERIALS RELIABILITY, 2004, 4 (03) :371-376
[10]   THE RELIABILITY OF SEMICONDUCTOR RAM MEMORIES WITH ON-CHIP ERROR-CORRECTION CODING [J].
GOODMAN, RM ;
SAYANO, M .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1991, 37 (03) :884-896