An efficient and scalable checkpointing and recovery algorithm for distributed systems

被引:0
|
作者
Kumar, K. P. Krishna [1 ]
Hansdah, R. C. [1 ]
机构
[1] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore 560012, Karnataka, India
来源
DISTRIBUTED COMPUTING AND NETWORKING, PROCEEDINGS | 2006年 / 4308卷
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we describe an efficient coordinated-checkpointing and recovery algorithm which can work even when the channels are assumed to be non-FIFO, and messages may be lost. Nodes are assumed to be autonomous, and they do not block while taking checkpoints. Based on the local conditions, any process can request the previous coordinator for the 'permission' to initiate a new checkpoint. Allowing multiple initiators of checkpoints avoids the bottleneck associated with a single initiator, but the algorithm permits only a single instance of checkpointing process at any given time, thus reducing much of the overhead associated with multiple initiators of distributed algorithms.
引用
收藏
页码:94 / 99
页数:6
相关论文
共 50 条
  • [1] AN EFFICIENT PROTOCOL FOR CHECKPOINTING RECOVERY IN DISTRIBUTED SYSTEMS
    KIM, JL
    PARK, T
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1993, 4 (08) : 955 - 960
  • [2] A Scalable Communication-Induced Checkpointing Algorithm for Distributed Systems
    Simon, Alberto Calixto
    Hernandez, Saul E. Pomares
    Cruz, Jose Roberto Perez
    Gomez-Gil, Pilar
    Drira, Khalil
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (04) : 886 - 896
  • [3] Efficient recovery approach in distributed systems with hybrid checkpointing
    Jiang, YX
    Gupta, B
    COMPUTERS AND THEIR APPLICATIONS, 2000, : 292 - 297
  • [4] Scalable Checkpointing-based Rollback Recovery Protocol For Geographically Distributed Systems
    Ahn, Jinho
    INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY, PTS 1-4, 2013, 263-266 : 1492 - 1496
  • [5] Design and analysis of an efficient algorithm for coordinated checkpointing in distributed systems
    Cao, JN
    Jia, WJ
    Jia, XH
    Cheung, TY
    ADVANCES IN PARALLEL AND DISTRIBUTED COMPUTING - PROCEEDINGS, 1997, : 261 - 268
  • [6] An efficient communication induced rollforward checkpointing and recovery protocol for distributed systems
    Gu, MM
    Zeng, L
    Liang, ZH
    Gupta, B
    COMPUTERS AND THEIR APPLICATIONS, 2000, : 298 - 302
  • [7] An efficient non-intrusive checkpointing algorithm for distributed database systems
    Wu, Jiang
    Manivarman, D.
    DISTRIBUTED COMPUTING AND NETWORKING, PROCEEDINGS, 2006, 4308 : 82 - 87
  • [8] A communication-induced checkpointing and asynchronous recovery algorithm for multithreaded distributed systems
    Tantikul, T
    Manivannan, D
    PARALLEL AND DISTRIBUTED COMPUTING: APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2004, 3320 : 284 - 292
  • [9] CHECKPOINTING AND ROLLBACK-RECOVERY FOR DISTRIBUTED SYSTEMS
    KOO, R
    TOUEG, S
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1987, 13 (01) : 23 - 31
  • [10] Concurrent checkpointing & rollback recovery for distributed systems
    Ye, X
    Keane, JA
    EUROSIM '96 - HPCN CHALLENGES IN TELECOMP AND TELECOM: PARALLEL SIMULATION OF COMPLEX SYSTEMS AND LARGE-SCALE APPLICATIONS, 1996, : 211 - 218