Design of a fault tolerant Solid State Mass Memory

被引:33
作者
Cardarilli, GC [1 ]
Leandri, A [1 ]
Marinucci, P [1 ]
Ottavi, M [1 ]
Pontarelli, S [1 ]
Re, M [1 ]
Salsano, A [1 ]
机构
[1] Univ Roma Tor Vergata, Dept Elect Engn, Rome, Italy
关键词
codes; memory architecture; redundancy; self checking; solid state mass memory; SSMM;
D O I
10.1109/TR.2003.821938
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes a novel architecture of fault tolerant Solid State Mass Memory (SSMM) for satellite applications. Mass memories with low-latency time, high throughput, and storage capabilities cannot be easily implemented using space qualified components, due to the inevitable technological delay of these kind of components. For this reason, the choice of Commercial Off The Shelf (COTS) components is mandatory for this application. Therefore, the design of an electronic system for space applications, based on commercial components, must match the reliability requirements using system level. methodologies [1], [2]. In the proposed architecture error-correcting codes are used to strengthen the commercial Dynamic Random Access Memory (DRAM) chips, while the system controller is developed by applying fault tolerant design solutions. The main features of the SSMM are the dynamic reconfiguration capability, and the high performances which can be gracefully reduced in case of permanent faults, maintaining part of the system functionality. This paper shows the system design methodology, the architecture, and the simulation results of the SSMM. The properties of the building blocks are described in detail both in their functionality and fault tolerant capabilities. A detailed analysis of the system reliability and data integrity is reported. The graceful degradation capability of our system allows different levels of acceptable performances, in terms of active I/O link Interfaces and storage capability. The results also show that the overall reliability of the SSMM is almost the same using different RS coding schemes, allowing a dynamic reconfiguration of the coding to reduce the latency (shorter codewords), or to improve the data integrity (longer codewords). The use of a scrubbing technique can be useful if a high SEU rate is expected, or if the data must be stored for a long period in the SSMM. The reported simulations show the behavior of the SSMM in presence of permanent and transient faults. In fact, we show that the SCU is able to recover from transient faults. On the other hand, using a spare microcontroller also hard faults can be tolerated. The distributed file system confines the unrecoverable fault effects only in a single I/O Interface. In this way, the SSMM maintains its capability to store and read data. The proposed system allows obtaining SSMM characterized by high reliability and high speed due the intrinsic parallelism of the switching matrix.
引用
收藏
页码:476 / 491
页数:16
相关论文
共 29 条
[1]  
BERTAZZONI S, 1999, INT S DEF FAULT TOL, P158
[2]  
Blahut R. E., 1983, THEORY PRACTICE ERRO
[3]   A fault-tolerant 176 gbit solid state mass memory architecture [J].
Cardarilli, GC ;
Marinucci, P ;
Ottavi, M ;
Salsano, A .
IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI SYSTEMS, PROCEEDINGS, 2000, :173-180
[4]  
CARDARILLI GC, 2000, P IEEE INT S CIRC SY, V2, P673
[5]   A MEASURE OF GRACEFUL DEGRADATION IN PARALLEL-COMPUTER SYSTEMS [J].
CHERKASSKY, V ;
MALEK, M .
IEEE TRANSACTIONS ON RELIABILITY, 1989, 38 (01) :76-81
[6]  
DANGELO S, 1999, IEEE INT S DEF FAULT, P330
[7]   Fault-tolerance of spaceborne Semiconductor Mass Memories [J].
Fichna, T ;
Gartner, M ;
Gliem, F ;
Rombeck, F .
TWENTY-EIGHTH ANNUAL INTERNATIONAL SYMPOSIUM ON FAULT-TOLERANT COMPUTING, DIGEST PAPERS, 1998, :408-413
[8]  
FOX J, 1997, 4 EUR C RAD ITS EFF, P240
[9]  
GARDARILLI GC, 2003, P 2003 INT S CIRC SY, V5, pV649
[10]   Comparison and application of different VHDL-based fault injection techniques [J].
Gracia, J ;
Baraza, JC ;
Gil, D ;
Gil, PJ .
2001 IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI SYSTEMS, PROCEEDINGS, 2001, :233-241