Selective checkpointing and rollbacks in multithreaded distributed systems

被引:4
作者
Kasbekar, M [1 ]
Das, CR [1 ]
机构
[1] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
来源
21ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS | 2001年
关键词
D O I
10.1109/ICDSC.2001.918931
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modem distributed systems are often multithreaded and object-oriented in their design. They require efficient techniques to checkpoint and rescore their state for improving fault-tolerance properties. The traditional process-based techniques of distributed checkpointing and rollback algorithms suffer from the problem of false dependencies, which makes them very rigid and inefficient for use with modem systems. In this paper we develop protocols that can selectively checkpoint (and rollback) some threads of a distributed system, while leaving others untouched and yet ensuring the consistency of state resulting from such a partial rollback.
引用
收藏
页码:39 / 46
页数:8
相关论文
共 16 条
  • [1] ALVISI L, 1995, INT CON DISTR COMP S, P229, DOI 10.1109/ICDCS.1995.500024
  • [2] Damani O. P., 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems, P234, DOI 10.1109/RELDIS.1999.805099
  • [3] DECONINCK G, 1993, O318 KATH U LEUV ESA
  • [4] DIETER WR, 1999, P 29 INT S FAULT TOL
  • [5] ELNOZAHY E, 1996, CMUCS96144 C MELL U
  • [6] HUANG Y, 1993, P 23 INT S FAULT TOL, P2
  • [7] JAOLTE P, 1994, FAULT TOLERANCE DIST
  • [8] Kasbekar M, 1999, IEEE T RELIAB, V48, P325, DOI 10.1109/24.814515
  • [9] KASBEKAR M, 1999, PAC RIM INT S DEP CO
  • [10] KASBEKAR M, 2000, THESIS PENNSYLVANIA