Consistency issues in distributed checkpoints

被引:31
作者
Hélary, JM
Netzer, RHB
Raynal, M
机构
[1] Univ Rennes 1, IRISA, F-35042 Rennes, France
[2] Brown Univ, Dept Comp Sci, Providence, RI 02921 USA
关键词
checkpointing; consistency; strong consistency; transitlessness; distributed systems; fault-tolerance; rollback recovery;
D O I
10.1109/32.761450
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A global checkpoint is a set of local checkpoints, one per process. The traditional consistency criterion for global checkpoints states that a global checkpoint is consistent if it does not include messages received and not sent. This paper investigates other consistency criteria, transitlessness, and strong consistency. A global checkpoint is transitless if it does not exhibit messages sent and nor received. Transitlessness can be seen as a dual of traditional consistency. Strong consistency is the addition of transitlessness to traditional consistency. The main result of this paper is a statement of the necessary and sufficient condition answering the following question: "Given an arbitrary set of local checkpoints, can this set be extended to a global checkpoint that satisfies P-m (where LP is traditional consistency, transitlessness, or strong consistency). From a practical point of view, this condition, when applied to transitlessness, is particularly interesting as it helps characterize which messages do not need to be recorded by checkpointing protocols.
引用
收藏
页码:274 / 281
页数:8
相关论文
共 19 条
  • [1] A unified framework for the specification and run-time detection of dynamic properties in distributed computations
    Babaoglu, O
    Fromentin, E
    Raynal, M
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 1996, 33 (03) : 287 - 298
  • [2] Baldoni R, 1997, DIG PAP INT SYMP FAU, P68, DOI 10.1109/FTCS.1997.614079
  • [3] DISTRIBUTED SNAPSHOTS - DETERMINING GLOBAL STATES OF DISTRIBUTED SYSTEMS
    CHANDY, KM
    LAMPORT, L
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1985, 3 (01): : 63 - 75
  • [4] ELNOZAHY EN, 1996, CMUCS96181 CARN U
  • [5] Fowler J., 1990, Proceedings. The 10th International Conference on Distributed Computing Systems (Cat. No.90CH2878-7), P134, DOI 10.1109/ICDCS.1990.89277
  • [6] Communication-induced determination of consistent snapshots
    Helary, JM
    Mostefaoui, A
    Raynal, M
    [J]. TWENTY-EIGHTH ANNUAL INTERNATIONAL SYMPOSIUM ON FAULT-TOLERANT COMPUTING, DIGEST PAPERS, 1998, : 208 - 217
  • [7] HELARY JM, 1987, P 6 ACM S PRINC DIST, P125
  • [8] RECOVERY IN DISTRIBUTED SYSTEMS USING OPTIMISTIC MESSAGE LOGGING AND CHECKPOINTING
    JOHNSON, DB
    ZWAENEPOEL, W
    [J]. JOURNAL OF ALGORITHMS, 1990, 11 (03) : 462 - 491
  • [9] ON DISTRIBUTED SNAPSHOTS
    LAI, TH
    YANG, TH
    [J]. INFORMATION PROCESSING LETTERS, 1987, 25 (03) : 153 - 158
  • [10] TIME, CLOCKS, AND ORDERING OF EVENTS IN A DISTRIBUTED SYSTEM
    LAMPORT, L
    [J]. COMMUNICATIONS OF THE ACM, 1978, 21 (07) : 558 - 565