Checkpoint and rollback in asynchronous distributed systems

被引:0
作者
Higaki, H
Shima, K
Tachikawa, T
Takizawa, M
机构
来源
IEEE INFOCOM '97 - THE CONFERENCE ON COMPUTER COMMUNICATIONS, PROCEEDINGS, VOLS 1-3: SIXTEENTH ANNUAL JOINT CONFERENCE OF THE IEEE COMPUTER AND COMMUNICATIONS SOCIETIES - DRIVING THE INFORMATION REVOLUTION | 1997年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a novel algorithm for taking checkpoints and rolling back the processes for recovery in asynchronous distributed systems. The algorithm has the following properties: (1) Multiple processes can simultaneously initiate the checkpointing. (2) No additional message is transmitted for taking checkpoints. (3) A set of local checkpoints taken by multiple processes denotes a consistent global state. (4) Multiple processes can initiate simultaneously the rollback recovery. (5) The minimum number of processes are rolled back. (6) Each process is rolled back asynchronously. The number of messages for rolling back the processes is O(1) where l is the number of channels. Therefore, the system is kept highly available by the algorithm presented in this paper.
引用
收藏
页码:998 / 1005
页数:8
相关论文
共 50 条
[41]   A distributed consistent global checkpoint algorithm for distributed mobile systems [J].
Manabe, Y .
PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, 2001, :125-132
[42]   MODEL OF ASYNCHRONOUS PROCESS ROLLBACK IN COMPUTER-NETWORKS [J].
SHTURC, IV ;
ROMANOVSKIJ, AB ;
VASILJEV, VR .
AVTOMATIKA I VYCHISLITELNAYA TEKHNIKA, 1989, (04) :63-68
[43]   Low-overhead checkpointing and rollback-recovery in distributed computing systems [J].
Liu, Yunlong ;
Chen, Junliang .
Jisuanji Xuebao/Chinese Journal of Computers, 1999, 22 (03) :249-257
[44]   ON THE USE OF RANDOM NUMBERS IN ASYNCHRONOUS SIMULATION VIA ROLLBACK [J].
TSITSIKLIS, JN .
INFORMATION PROCESSING LETTERS, 1989, 31 (03) :139-144
[45]   Scalable Checkpointing-based Rollback Recovery Protocol For Geographically Distributed Systems [J].
Ahn, Jinho .
INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY, PTS 1-4, 2013, 263-266 :1492-1496
[46]   Asynchronous Message Logging Based Rollback Recovery in MANETs [J].
Jaggi, Parmeet Kaur ;
Singh, Awadhesh Kumar .
2012 2ND IEEE INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2012, :557-562
[47]   On Composition of Checkpoint and Recovery Protocols for Distributed Systems [J].
Chattopadyay, Soumi ;
Banerjee, Ansuman ;
Paul, Himadri Sekhar .
SERVICE-ORIENTED COMPUTING - ICSOC 2015 WORKSHOPS, 2016, 9586 :231-242
[48]   A Flexible Checkpoint/Restart Model in Distributed Systems [J].
Bouguerra, Mohamed-Slim ;
Gautier, Thierry ;
Trystram, Denis ;
Vincent, Jean-Marc .
PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I, 2010, 6067 :206-+
[49]   Quantifying rollback propagation in distributed checkpointing [J].
Agbaria, A ;
Attiya, H ;
Friedman, R ;
Vitenberg, R .
20TH IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 2001, :36-45
[50]   Quantifying rollback propagation in distributed checkpointing [J].
Agbaria, A ;
Attiya, H ;
Friedman, R ;
Vitenberg, R .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2004, 64 (03) :370-384