Application-level fault tolerance as a complement to system-level fault tolerance

被引:14
作者
Haines, J [1 ]
Lakamraju, V [1 ]
Koren, I [1 ]
Krishna, CM [1 ]
机构
[1] Univ Massachusetts, Dept Elect & Comp Engn, Amherst, MA 01003 USA
关键词
distributed real-time systems; fault tolerance; checkpointing; imprecise computation; target tracking; beam forming;
D O I
10.1023/A:1008181429693
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As multiprocessor systems become more complex, their reliability will need to increase as well. In this paper we propose a novel technique which is applicable to a wide variety of distributed real-time systems, especially those exhibiting data parallelism. System-level fault tolerance involves reliability techniques incorporated within the system hardware and software whereas application-level fault tolerance involves reliability techniques incorporated within the application software. We assert that, for high reliability, a combination of system-level fault tolerance and application-level fault tolerance works best. In many systems, application-level fault tolerance can be used to bridge the gap when system-level fault tolerance alone does not provide the required reliability. We exemplify this with the RTHT target tracking benchmark and the ABF beamforming benchmark.
引用
收藏
页码:53 / 68
页数:16
相关论文
共 10 条
[1]  
ALLALOUF M, 1998, ADV SIM TECHN C, P191
[2]  
CASTANON DA, 1997, A006 DARPA REALT BEN
[3]  
HAMZA R, 1998, SONAR ADAPTIVE BEAMF
[4]  
KRISHNA CM, 1997, REALTIME SYSTEMS
[5]   A FAULT-TOLERANT SCHEDULING PROBLEM [J].
LIESTMAN, AL ;
CAMPBELL, RH .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1986, 12 (11) :1089-1095
[6]   IMPRECISE COMPUTATIONS [J].
LIU, JWS ;
SHIH, WK ;
LIN, KJ ;
BETTATI, R ;
CHUNG, JY .
PROCEEDINGS OF THE IEEE, 1994, 82 (01) :83-94
[7]  
Randell B., 1975, IEEE T SOFTWARE ENG, V1, P220
[8]  
Siewiorek D.P., 1992, RELIABLE COMPUTER SY
[9]  
SPEIRS NA, 1989, P 9 INT S FAULT TOL, P184
[10]  
VANVOORST B, 1997, P 11 INT PAR PROC S