A causal message logging protocol with asynchronous checkpointing for distributed systems

被引：0

作者：

Ahn, J ^{[1
]}

Kim, K ^{[1
]}

Hwang, C ^{[1
]}

机构：

[1] Korea Univ, Dept Comp Sci & Engn, Seoul 136701, South Korea

来源：

PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS | 2000年

关键词：

distributed systems; fault-tolerance; asynchronous checkpointing; causal message logging; recovery;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Causal message logging is an efficient approach for tolerating failures of processes in distributed systems because it has the advantages of both pessimistic and optimistic message logging approach. However, traditional causal message logging protocols prevent live processes from executing continuously their computation and require some synchronous logging to the stable storage during recovery. Although Elnozahy protocol solves the problems, it has the central recovery leader problem. Additionally, if it were integrated with asynchronous checkpointing, it may result in inconsistency problems in case of concurrent failures. In this paper we present a new causal message logging protocol with asynchronous checkpointing to need to maintain only the latest checkpoint of each process and allow live processes to execute continuously their computation even in concurrent failures during recovery. Moreover the protocol solves the problems of Elnozahy protocol and improves asynchrony during recovery because the protocol enables each recovering process to be responsible for only its recovery.

引用

页码：523 / 528

页数：6

共 50 条

[41] Checkpointing in distributed computing systems
Wong, KF
Franklin, M
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1996, 35 (01) : 67 - 75
[42] FNB: Fast Non-Blocking Coordinated Checkpointing Protocol for Distributed Systems
Abdelhafidi, Zohra
Djoudi, Mohamed
Lagraa, Nasreddine
Yagoubi, Mohamed Bachir
THEORY OF COMPUTING SYSTEMS, 2015, 57 (02) : 397 - 425
[43] A low-cost hybrid coordinated checkpointing protocol for mobile distributed systems
Kumar, Parveen
MOBILE INFORMATION SYSTEMS, 2008, 4 (01) : 13 - 32
[44] Scalable Checkpointing-based Rollback Recovery Protocol For Geographically Distributed Systems
Ahn, Jinho
INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY, PTS 1-4, 2013, 263-266 : 1492 - 1496
[45] FNB: Fast Non-Blocking Coordinated Checkpointing Protocol for Distributed Systems
Zohra Abdelhafidi
Mohamed Djoudi
Nasreddine Lagraa
Mohamed Bachir Yagoubi
Theory of Computing Systems, 2015, 57 : 397 - 425
[46] CCUML: a checkpointing protocol for distributed system processes
Neogy, S
Sinha, A
Das, PK
TENCON 2004 - 2004 IEEE REGION 10 CONFERENCE, VOLS A-D, PROCEEDINGS: ANALOG AND DIGITAL TECHNIQUES IN ELECTRICAL ENGINEERING, 2004, : B553 - B556
[47] Asynchronous Message Logging Based Rollback Recovery in MANETs
Jaggi, Parmeet Kaur
Singh, Awadhesh Kumar
2012 2ND IEEE INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2012, : 557 - 562
[48] Combination of consistent checkpointing and message logging - a novel CRR scheme for clusters of workstations
Wang, Dongsheng
Zheng, Weimin
Shen, Meiming
Wang, Dingxing
Chinese Journal of Electronics, 1997, 6 (03): : 32 - 35
[49] From tasks graphs to asynchronous distributed checkpointing with local restart
Lion, Romain
Thibault, Samuel
PROCEEDINGS OF 2020 IEEE/ACM 10TH WORKSHOP ON FAULT TOLERANCE FOR HPC AT EXTREME SCALE (FTXS 2020), 2020, : 31 - 40
[50] The cost of checkpointing, logging and recovery for the mobile agent systems
Kim, H
Yeom, HY
Park, T
Park, H
2002 PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2002, : 45 - 48

← 1 2 3 4 5 →