Design, implementation and performance of fault-tolerant message passing interface (MPI)

被引:0
作者
Selvakumar, AD [1 ]
Sobha, PM [1 ]
Ravindra, GC [1 ]
Pitchiah, R [1 ]
机构
[1] C DAC, Real Time Syst Grp, Bangalore 560038, Karnataka, India
来源
PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS | 2004年
关键词
synchronous/asynchronous checkpointing; fault-tolerance; rollback; /; recovery; message passing interface; task migration; cluster computing;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Fault Tolerant MPI (FTMPI) enables fault tolerance to the MPICH [11]. FTMPI is a transparent fault-tolerant environment, based on synchronous checkpointing and restarting mechanism. FTYPI relies on non-multithreaded single process checkpointing library to checkpoint an application process. Global replicated System Controller and cluster node specific Node Controller monitors and controls check pointing and recovery activities of all MPI applications within the cluster. This paper details the architecture to provide fault tolerance mechanism for MPI based applications running on clusters and the performance of NAS parallel benchmarks and parallelized medium range weather forecasting models, P-T80 and P-T126. The architecture addresses the following issues also: Replicating System Controller to avoid single point offailure. Ensuring consistency of checkpoint files based on distributed two phase commit protocol. Robustfault detection hierarchy.
引用
收藏
页码:145 / 150
页数:6
相关论文
共 12 条
[1]  
AGBARIA A, 1999, 8 IEEE INT S HIGH PE
[2]  
BATCHU R, 2001, 1 INT S CLUST COMP G
[3]  
BECH, 1999, J FUTURE GENERATION
[4]  
BOSILCA G, P P IEEE ACM SC2002
[5]  
FAGG GF, 2000, EUROPVM MPI USERS GR
[6]  
PRUITT PN, THESIS COLL W M VIRG
[7]   A task migration implementation of the message-passing interface [J].
Robinson, J ;
Russ, SH ;
Flachs, B ;
Heckel, B .
PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, 1996, :61-68
[8]  
ROBINSON J, IEEE T PARALLEL DIST
[9]  
RUSS SH, EIRSERC956 MSSU
[10]  
RUSS SH, HECTOR AGENT ARCHITE