CPPC-G:: Fault-tolerant applications on the Grid

被引:0
作者
Diaz, Daniel [1 ]
Pardo, Xoan C. [1 ]
Martin, Maria J. [1 ]
Gonzalez, Patricia [1 ]
Rodriguez, Gabriel [1 ]
机构
[1] Univ A Coruna, Comp Architecture Grp, La Coruna, Spain
来源
PARALLEL PROCESSING AND APPLIED MATHEMATICS | 2008年 / 4967卷
关键词
fault-tolerance; Grid computing; globus; MPI; check-pointing;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The Grid community has made an important effort in developing middleware to provide different functionalities, such as resource discovery, resource management, job submission, execution monitoring. As part of this effort this paper addresses the design and implementation of an architecture (CPPC-G) based on services to manage the execution of fault tolerant applications on Grids. The CPPC (Controller/Precompiler for Portable Checkpointing) framework is used to insert checkpoint instrumentation into the application code. Designed services will be in charge of submission and monitoring of the execution of the application, management of checkpoint files and detection and automatic restart of failed executions.
引用
收藏
页码:852 / 859
页数:8
相关论文
共 6 条
  • [1] A survey of rollback-recovery protocols in message-passing systems
    Elnozahy, EN
    Alvisi, L
    Wang, YM
    Johnson, DB
    [J]. ACM COMPUTING SURVEYS, 2002, 34 (03) : 375 - 408
  • [2] Globus Toolkit version 4: Software for service-oriented systems
    Foster, Ian
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2006, 21 (04) : 513 - 520
  • [3] Consistency issues in distributed checkpoints
    Hélary, JM
    Netzer, RHB
    Raynal, M
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1999, 25 (02) : 274 - 281
  • [4] *NAT CTR SUP APPL, HDF5 NAT CTR SUP APP
  • [5] Controller/precompiler for portable checkpointing
    Rodríguez, G
    Martín, MJ
    González, P
    Touriño, J
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (02) : 408 - 417
  • [6] RODRIGUEZ G, 2006, P 12 WORKSH COMP PAR, P396