Efficient techniques for adaptive independent checkpointing in distributed systems

被引:0
作者
Lin, CM [1 ]
Dow, CR [1 ]
机构
[1] Feng Chia Univ, Dept Comp Sci & Informat Engn, Taichung 40724, Taiwan
关键词
distributed systems; fault tolerance; checkpointing; failure recovery;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work presents two novel algorithms to prevent rollback propagation for independent checkpointing: an efficient adaptive independent checkpointing algorithm and an optimized adaptive independent checkpointing algorithm. The last opportunity strategy that yields a better performance than the conservation strategy is also employed to prevent useless checkpoints for both causal rewinding paths and non-causal rewinding paths. The two methods proposed herein are domino effect-free and require only a limited amount of control information. They also take less unnecessary adaptive checkpoints than other algorithms. Furthermore, experimental results indicate Chat the checkpoint overhead of our techniques is lower than that of the coordinated checkpointing and domino effect-free algorithms fur service-providing applications.
引用
收藏
页码:1642 / 1653
页数:12
相关论文
共 27 条
[1]  
BALDONI R, 1995, RR2569 IRISA
[2]  
BALDONI R, 1995, RR2564 IRISA
[3]   Hypevisor-based fault-tolerance [J].
Bressoud, TC ;
Schneider, FB .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1996, 14 (01) :80-107
[4]   Efficient rollback recovery technique in distributed computing systems [J].
Chiu, GM ;
Young, CR .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1996, 7 (06) :565-577
[5]  
Drummond LMD, 1996, J PARALLEL DISTR COM, V39, P153, DOI 10.1006/jpdc.1996.0163
[6]  
ELNOZAHY EN, 1996, CMUCS96181
[7]  
FERRARI A, 1996, CS9615 U VIRG
[8]  
Fowler J., 1990, Proceedings. The 10th International Conference on Distributed Computing Systems (Cat. No.90CH2878-7), P134, DOI 10.1109/ICDCS.1990.89277
[9]  
HUNT GC, 1996, URCSDTR626
[10]  
JANSSENS B, 1991, PROC INT CONF PARAL, pI505