A communication-induced checkpointing and asynchronous recovery algorithm for multithreaded distributed systems

被引:0
作者
Tantikul, T [1 ]
Manivannan, D [1 ]
机构
[1] Univ Kentucky, Dept Comp Sci, Lexington, KY 40506 USA
来源
PARALLEL AND DISTRIBUTED COMPUTING: APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS | 2004年 / 3320卷
关键词
distributed checkpointing; communication-induced checkpointing; fault-tolerance; multithreaded distributed system; asynchronous recovery;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Checkpointing and recovery in traditional distributed systems is relatively well established. However, checkpointing and recovery in multithreaded distributed systems has not been studied in the literature. Using the traditional checkpointing and recovery algorithms in multithreaded systems leads to false causality problem and high checkpointing overhead. The checkpointing algorithm is implemented at the process level to reduce number of checkpoints and the recovery algorithm is implemented at the thread level which minimizes the false causality problem. The algorithm also takes advantage of the communication-induced checkpointing method to reduce the message overhead.
引用
收藏
页码:284 / 292
页数:9
相关论文
共 8 条
[1]  
DAMANI OP, 1999, S RELIABLE DISTRIBUT, P234
[2]   Selective checkpointing and rollbacks in multithreaded distributed systems [J].
Kasbekar, M ;
Das, CR .
21ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 2001, :39-46
[3]   CHECKPOINTING AND ROLLBACK-RECOVERY FOR DISTRIBUTED SYSTEMS [J].
KOO, R ;
TOUEG, S .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1987, 13 (01) :23-31
[4]   TIME, CLOCKS, AND ORDERING OF EVENTS IN A DISTRIBUTED SYSTEM [J].
LAMPORT, L .
COMMUNICATIONS OF THE ACM, 1978, 21 (07) :558-565
[5]  
Li K., 1991, Proceedings. Tenth Symposium on Reliable Distributed Systems (Cat. No.91CH3021-3), P2, DOI 10.1109/RELDIS.1991.145398
[6]   A cutter orientation modification method for the reduction of non-linearity errors in five-axis CNC machining [J].
Liang, H ;
Hong, H ;
Svoboda, J .
MACHINING SCIENCE AND TECHNOLOGY, 2003, 7 (01) :1-18
[7]   On the relationship of thermodynamic parameters with the buried surface area in protein-ligand complex formation [J].
Singha, NC ;
Surolia, N ;
Surolia, A .
BIOSCIENCE REPORTS, 1996, 16 (01) :1-10
[8]   OPTIMISTIC RECOVERY IN DISTRIBUTED SYSTEMS [J].
STROM, RE ;
YEMINI, S .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1985, 3 (03) :204-226