DYNAMIC CHECKPOINTING PROCEDURE FOR THE DESIGN OF STABILIZING PROTOCOLS

被引:3
作者
SALEH, K [1 ]
AHMAD, I [1 ]
ALSAQABI, K [1 ]
AGARWAL, A [1 ]
机构
[1] CONCORDIA UNIV,DEPT ELECT & COMP ENGN,MONTREAL H3G 1M8,QUEBEC,CANADA
关键词
CHECKPOINTING; COMMUNICATING FINITE STATE MACHINE; COMMUNICATION PROTOCOL; PROTOCOL DESIGN; PROTOCOL STABILIZATION;
D O I
10.1016/0950-5849(93)90046-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, the problem of designing stabilizing computer communication protocols is addressed. A communication protocol is said to be stabilizing, if starting from or being at any illegal global state, the protocol will eventually reach a legal (or consistent) global state, and resume its normal execution. To achieve protocol stabilization, the protocol must he able to detect the error when it occurs, and then it must recover from that error and revert to a legal protocol state. Based on the concepts of event indices and maximally reachable event index tuples, we propose a novel approach for distributed dynamic checkpointing in which the overhead associated with the more traditional periodic checkpointing techniques is avoided. Furthermore, our checkpointing technique can be used as the basis for optimal protocol recovery to achieve stabilization. An example illustrating the new dynamic checkpointing technique is also provided.
引用
收藏
页码:479 / 485
页数:7
相关论文
共 20 条
[1]   FINITE STATE DESCRIPTION OF COMMUNICATION PROTOCOLS [J].
BOCHMANN, GV .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1978, 2 (4-5) :361-372
[2]   DISTRIBUTED SNAPSHOTS - DETERMINING GLOBAL STATES OF DISTRIBUTED SYSTEMS [J].
CHANDY, KM ;
LAMPORT, L .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1985, 3 (01) :63-75
[3]   SELF-STABILIZING SYSTEMS IN SPITE OF DISTRIBUTED CONTROL [J].
DIJKSTRA, EW .
COMMUNICATIONS OF THE ACM, 1974, 17 (11) :643-644
[4]  
FOWLER J, 1990, 10TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, P134
[5]   STABILIZING COMMUNICATION PROTOCOLS [J].
GOUDA, MG ;
MULTARI, NJ .
IEEE TRANSACTIONS ON COMPUTERS, 1991, 40 (04) :448-458
[6]  
KAKUDA Y, 1991, IEICE TRANS COMMUN, V74, P1715
[7]  
KAKUDA Y, 1992, 1992 IEEE WORKSH FAU, P8
[8]  
KAKUDA Y, 1991, DEC P INT S COMM, P704
[9]   CHECKPOINTING AND ROLLBACK-RECOVERY FOR DISTRIBUTED SYSTEMS [J].
KOO, R ;
TOUEG, S .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1987, 13 (01) :23-31
[10]   TIME, CLOCKS, AND ORDERING OF EVENTS IN A DISTRIBUTED SYSTEM [J].
LAMPORT, L .
COMMUNICATIONS OF THE ACM, 1978, 21 (07) :558-565