A causal message logging protocol with asynchronous checkpointing for distributed systems

被引:0
|
作者
Ahn, J [1 ]
Kim, K [1 ]
Hwang, C [1 ]
机构
[1] Korea Univ, Dept Comp Sci & Engn, Seoul 136701, South Korea
关键词
distributed systems; fault-tolerance; asynchronous checkpointing; causal message logging; recovery;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Causal message logging is an efficient approach for tolerating failures of processes in distributed systems because it has the advantages of both pessimistic and optimistic message logging approach. However, traditional causal message logging protocols prevent live processes from executing continuously their computation and require some synchronous logging to the stable storage during recovery. Although Elnozahy protocol solves the problems, it has the central recovery leader problem. Additionally, if it were integrated with asynchronous checkpointing, it may result in inconsistency problems in case of concurrent failures. In this paper we present a new causal message logging protocol with asynchronous checkpointing to need to maintain only the latest checkpoint of each process and allow live processes to execute continuously their computation even in concurrent failures during recovery. Moreover the protocol solves the problems of Elnozahy protocol and improves asynchrony during recovery because the protocol enables each recovering process to be responsible for only its recovery.
引用
收藏
页码:523 / 528
页数:6
相关论文
共 50 条
  • [41] Checkpointing in distributed computing systems
    Wong, KF
    Franklin, M
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1996, 35 (01) : 67 - 75
  • [42] FNB: Fast Non-Blocking Coordinated Checkpointing Protocol for Distributed Systems
    Abdelhafidi, Zohra
    Djoudi, Mohamed
    Lagraa, Nasreddine
    Yagoubi, Mohamed Bachir
    THEORY OF COMPUTING SYSTEMS, 2015, 57 (02) : 397 - 425
  • [43] A low-cost hybrid coordinated checkpointing protocol for mobile distributed systems
    Kumar, Parveen
    MOBILE INFORMATION SYSTEMS, 2008, 4 (01) : 13 - 32
  • [44] Scalable Checkpointing-based Rollback Recovery Protocol For Geographically Distributed Systems
    Ahn, Jinho
    INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY, PTS 1-4, 2013, 263-266 : 1492 - 1496
  • [45] FNB: Fast Non-Blocking Coordinated Checkpointing Protocol for Distributed Systems
    Zohra Abdelhafidi
    Mohamed Djoudi
    Nasreddine Lagraa
    Mohamed Bachir Yagoubi
    Theory of Computing Systems, 2015, 57 : 397 - 425
  • [46] CCUML: a checkpointing protocol for distributed system processes
    Neogy, S
    Sinha, A
    Das, PK
    TENCON 2004 - 2004 IEEE REGION 10 CONFERENCE, VOLS A-D, PROCEEDINGS: ANALOG AND DIGITAL TECHNIQUES IN ELECTRICAL ENGINEERING, 2004, : B553 - B556
  • [47] Asynchronous Message Logging Based Rollback Recovery in MANETs
    Jaggi, Parmeet Kaur
    Singh, Awadhesh Kumar
    2012 2ND IEEE INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2012, : 557 - 562
  • [48] Combination of consistent checkpointing and message logging - a novel CRR scheme for clusters of workstations
    Wang, Dongsheng
    Zheng, Weimin
    Shen, Meiming
    Wang, Dingxing
    Chinese Journal of Electronics, 1997, 6 (03): : 32 - 35
  • [49] From tasks graphs to asynchronous distributed checkpointing with local restart
    Lion, Romain
    Thibault, Samuel
    PROCEEDINGS OF 2020 IEEE/ACM 10TH WORKSHOP ON FAULT TOLERANCE FOR HPC AT EXTREME SCALE (FTXS 2020), 2020, : 31 - 40
  • [50] The cost of checkpointing, logging and recovery for the mobile agent systems
    Kim, H
    Yeom, HY
    Park, T
    Park, H
    2002 PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2002, : 45 - 48