Message logging: Pessimistic, optimistic, causal, and optimal

被引:117
作者
Alvisi, L [1 ]
Marzullo, K
机构
[1] Univ Texas, Dept Comp Sci, Austin, TX 78712 USA
[2] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
基金
美国国家科学基金会; 美国国家航空航天局;
关键词
message logging; optimistic protocols; pessimistic protocols; checkpoint-restart protocols; resilient processes; specification of fault-tolerance techniques;
D O I
10.1109/32.666828
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Message-logging protocols are an integral part of a popular technique for implementing processes that can recover from crash failures. All message-logging protocols require that, when recovery is complete, there be no orphan processes, which are surviving processes whose states are inconsistent with the recovered state of a crashed process. We give a precise specification of the consistency property "no orphan processes." From this specification, we describe how different existing classes of message-logging protocols (namely optimistic, pessimistic, and a class that we call causal) implement this property. We then propose a set of metrics to evaluate the performance of message-logging protocols, and characterize the protocols that are optimal with respect to these metrics. Finally, starting from a protocol that relies on causal delivery order, we show how to derive optimal causal protocols that tolerate f overlapping failures and recoveries for a parameter f: 1 less than or equal to f less than or equal to n.
引用
收藏
页码:149 / 159
页数:11
相关论文
共 22 条
[1]  
Alvisi L., 1996, Proceedings of the Fifteenth Annual ACM Symposium on Principles of Distributed Computing, P58, DOI 10.1145/248052.248061
[2]  
ALVISI L, 1993, P 23 FAULT TOL COMP, P145
[3]  
ALVISI L, 1996, THESIS CORNELL U DEP
[4]  
BIRMAN K, 1991, ACM T COMPUT SYST, V9, P272, DOI 10.1145/128738.128742
[5]   RELIABLE COMMUNICATION IN THE PRESENCE OF FAILURES [J].
BIRMAN, KP ;
JOSEPH, TA .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1987, 5 (01) :47-76
[6]  
BORG A, 1983, P 9 ACM S OP SYST PR, P90
[7]   MANETHO - TRANSPARENT ROLLBACK-RECOVERY WITH LOW OVERHEAD, LIMITED ROLLBACK, AND FAST OUTPUT COMMIT [J].
ELNOZAHY, EN ;
ZWAENEPOEL, W .
IEEE TRANSACTIONS ON COMPUTERS, 1992, 41 (05) :526-531
[8]  
ELNOZAHY EN, 1994, 24 INT S FAULT TOL C, P298
[9]  
GRAY JN, 1977, LECT NOTES COMPUTER, V60, P393
[10]  
Johnson D. B., 1987, 17 ANN INT S FAULT T, P14