Checkpoint and rollback in asynchronous distributed systems

被引:0
作者
Higaki, H
Shima, K
Tachikawa, T
Takizawa, M
机构
来源
IEEE INFOCOM '97 - THE CONFERENCE ON COMPUTER COMMUNICATIONS, PROCEEDINGS, VOLS 1-3: SIXTEENTH ANNUAL JOINT CONFERENCE OF THE IEEE COMPUTER AND COMMUNICATIONS SOCIETIES - DRIVING THE INFORMATION REVOLUTION | 1997年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a novel algorithm for taking checkpoints and rolling back the processes for recovery in asynchronous distributed systems. The algorithm has the following properties: (1) Multiple processes can simultaneously initiate the checkpointing. (2) No additional message is transmitted for taking checkpoints. (3) A set of local checkpoints taken by multiple processes denotes a consistent global state. (4) Multiple processes can initiate simultaneously the rollback recovery. (5) The minimum number of processes are rolled back. (6) Each process is rolled back asynchronously. The number of messages for rolling back the processes is O(1) where l is the number of channels. Therefore, the system is kept highly available by the algorithm presented in this paper.
引用
收藏
页码:998 / 1005
页数:8
相关论文
共 50 条
[21]   Distributed Computability in Byzantine Asynchronous Systems [J].
Mendes, Hammurabi ;
Tasson, Christine ;
Herlihy, Maurice .
STOC'14: PROCEEDINGS OF THE 46TH ANNUAL 2014 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2014, :704-713
[22]   Fair Synthesis for Asynchronous Distributed Systems [J].
Gastin, Paul ;
Sznajder, Nathalie .
ACM TRANSACTIONS ON COMPUTATIONAL LOGIC, 2013, 14 (02)
[23]   PROCESSOR MEMBERSHIP IN ASYNCHRONOUS DISTRIBUTED SYSTEMS [J].
MOSER, LE ;
MELLIARSMITH, PM ;
AGRAWALA, V .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1994, 5 (05) :459-473
[24]   Leader election in asynchronous distributed systems [J].
Stoller, SD .
IEEE TRANSACTIONS ON COMPUTERS, 2000, 49 (03) :283-284
[25]   On synchronous and asynchronous interaction in distributed systems [J].
van Glabbeek, Rob ;
Goltz, Ursula ;
Schicke, Jens-Wolfhard .
MATHEMATICAL FOUNDATIONS OF COMPUTER SCIENCE 2008, PROCEEDINGS, 2008, 5162 :16-+
[26]   Synthesis and control of asynchronous and distributed systems [J].
Darondeau, Philippe .
SEVENTH INTERNATIONAL CONFERENCE ON APPLICATION OF CONCURRENCY TO SYSTEM DESIGN, PROCEEDINGS, 2007, :13-22
[27]   ENSURING CORRECT ROLLBACK RECOVERY IN DISTRIBUTED SHARED-MEMORY SYSTEMS [J].
JANSSENS, B ;
FUCHS, WK .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1995, 29 (02) :211-218
[28]   Dynamic checkpoint scheduling for distributed systems [J].
Park, TS ;
Kim, JL .
PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS - PROCEEDINGS OF THE ISCA 9TH INTERNATIONAL CONFERENCE, VOLS I AND II, 1996, :560-566
[29]   Asynchronous Distributed Power Control of Multimicrogrid Systems [J].
Wang, Zhaojian ;
Chen, Laijun ;
Liu, Feng ;
Yi, Peng ;
Cao, Ming ;
Deng, Sicheng ;
Mei, Shengwei .
IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2020, 7 (04) :1960-1973
[30]   Genuine atomic multicast in asynchronous distributed systems [J].
Guerraoui, R ;
Schiper, A .
THEORETICAL COMPUTER SCIENCE, 2001, 254 (1-2) :297-316