Checkpoint and rollback in asynchronous distributed systems

被引:0
作者
Higaki, H
Shima, K
Tachikawa, T
Takizawa, M
机构
来源
IEEE INFOCOM '97 - THE CONFERENCE ON COMPUTER COMMUNICATIONS, PROCEEDINGS, VOLS 1-3: SIXTEENTH ANNUAL JOINT CONFERENCE OF THE IEEE COMPUTER AND COMMUNICATIONS SOCIETIES - DRIVING THE INFORMATION REVOLUTION | 1997年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a novel algorithm for taking checkpoints and rolling back the processes for recovery in asynchronous distributed systems. The algorithm has the following properties: (1) Multiple processes can simultaneously initiate the checkpointing. (2) No additional message is transmitted for taking checkpoints. (3) A set of local checkpoints taken by multiple processes denotes a consistent global state. (4) Multiple processes can initiate simultaneously the rollback recovery. (5) The minimum number of processes are rolled back. (6) Each process is rolled back asynchronously. The number of messages for rolling back the processes is O(1) where l is the number of channels. Therefore, the system is kept highly available by the algorithm presented in this paper.
引用
收藏
页码:998 / 1005
页数:8
相关论文
共 50 条
[31]   Genuine atomic multicast in asynchronous distributed systems [J].
Guerraoui, R ;
Schiper, A .
THEORETICAL COMPUTER SCIENCE, 2001, 254 (1-2) :297-316
[32]   Compositional models of distributed and asynchronous dynamical systems [J].
Fabre, E .
PROCEEDINGS OF THE 41ST IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-4, 2002, :1-6
[33]   Solving election problem in asynchronous distributed systems [J].
Park, SeongHoon .
COMPUTATIONAL SCIENCE - ICCS 2006, PT 1, PROCEEDINGS, 2006, 3991 :736-743
[34]   PTRebeca: Modeling and analysis of distributed and asynchronous systems [J].
Jafari, Ali ;
Khamespanah, Ehsan ;
Sirjani, Marjan ;
Hermanns, Holger ;
Cimini, Matteo .
SCIENCE OF COMPUTER PROGRAMMING, 2016, 128 :22-50
[35]   Revisiting the election problem in asynchronous distributed systems [J].
Bauk, SU .
ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2005, 3756 :141-150
[36]   Reasoning about knowledge in asynchronous distributed systems [J].
Costa, Vania ;
Benevides, Mario .
LOGIC JOURNAL OF THE IGPL, 2005, 13 (01) :5-28
[37]   EFFICIENCY OF SYNCHRONOUS VERSUS ASYNCHRONOUS DISTRIBUTED SYSTEMS [J].
ARJOMANDI, E ;
FISCHER, MJ ;
LYNCH, NA .
JOURNAL OF THE ACM, 1983, 30 (03) :449-456
[38]   Distributed and asynchronous discrete event systems diagnosis [J].
Benveniste, A ;
Haar, S ;
Fabre, E ;
Jard, C .
42ND IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-6, PROCEEDINGS, 2003, :3742-3747
[39]   Asynchronous Work Stealing on Distributed Memory Systems [J].
Li, Shigang ;
Hu, Jingyuan ;
Cheng, Xin ;
Zhao, Chongchong .
PROCEEDINGS OF THE 2013 21ST EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING, 2013, :198-202
[40]   Reasoning about asynchronous behaviour in distributed systems [J].
Henderson, P .
EIGHTH IEEE INTERNATIONAL CONFERENCE ON ENGINEERING OF COMPLEX COMPUTER SYSTEMS, PROCEEDINGS, 2002, :17-24