AN EFFICIENT PROTOCOL FOR CHECKPOINTING RECOVERY IN DISTRIBUTED SYSTEMS

被引：40

作者：

KIM, JL

PARK, T

机构：

[1] Department of Computer Science, Texas A&M University

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 1993年 / 4卷 / 08期

关键词：

CHECKPOINTING RECOVERY; CONSISTENT RECOVERY LINE; DISTRIBUTED SYSTEMS; FAULT TOLERANCE;

D O I：

10.1109/71.238629

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper presents a new efficient synchronized checkpointing protocol which exploits the dependency relation between processes in distributed systems. In our protocol, a process takes a checkpoint when it knows that all processes on which it computationally depends took their checkpoints, and hence the process need not always wait for the decision made by the checkpointing coordinator as in the conventional synchronized protocols. As a result, the checkpointing coordination time is substantially reduced and the possibility of total abort of the checkpointing coordination is reduced.

引用

页码：955 / 960

页数：6

共 20 条

[1]

BARIGAZZI G, 1983, 13TH P IEEE INT S FA, P48

[2]

BHARGAVA B, 1988, 7 INT S REL DISTR SY, P3

[3]

BORG A, 1983, 9TH P ACM S OP SYST, P90

[4]

BROCK A, 1981, 6TH P IEEE INT C PRO, P707

[5] DISTRIBUTED SNAPSHOTS - DETERMINING GLOBAL STATES OF DISTRIBUTED SYSTEMS [J].

CHANDY, KM ;

LAMPORT, L .

ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1985, 3 (01) :63-75

[6]

Johnson D. B., 1987, 17 ANN INT S FAULT T, P14

[7]

JUANG TTY, 1991, 11TH P INT C DISTR C, P464

[8]

KIM KH, 1978, 16TH P IEEE S FAULT, P58

[9] CHECKPOINTING AND ROLLBACK-RECOVERY FOR DISTRIBUTED SYSTEMS [J].

KOO, R ;

TOUEG, S .

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1987, 13 (01) :23-31

[10] COMPUTING OPTIMAL CHECKPOINTING STRATEGIES FOR ROLLBACK AND RECOVERY-SYSTEMS [J].

LECUYER, P ;

MALENFANT, J .

IEEE TRANSACTIONS ON COMPUTERS, 1988, 37 (04) :491-496

← 1 2 →