A technique for non-invasive application-level checkpointing

被引:8
作者
Arora, Ritu [1 ]
Bangalore, Purushotham [1 ]
Mernik, Marjan [1 ,2 ]
机构
[1] Univ Alabama Birmingham, Dept Comp & Informat Sci, Birmingham, AL 35294 USA
[2] Univ Maribor, Fac Elect Engn & Comp Sci, SLO-2000 Maribor, Slovenia
基金
美国国家科学基金会;
关键词
Fault-tolerance; Application-level checkpointing; Domain-specific language; PARALLEL;
D O I
10.1007/s11227-010-0383-5
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
One of the key elements required for writing self-healing applications for distributed and dynamic computing environments is checkpointing. Checkpointing is a mechanism by which an application is made resilient to failures by storing its state periodically to the disk. The main goal of this research is to enable non-invasive reengineering of existing applications to insert Application-Level Checkpointing (ALC) mechanism. The Domain-Specific Language (DSL) developed in this research serves as a perfect means towards this end and is used for obtaining the ALC-specifications from the end-users. These specifications are used for generating and inserting the actual checkpointing code into the existing application. The performance of the application having the generated checkpointing code is comparable to the performance of the application in which the checkpointing code was inserted manually. With slight modifications, the DSL developed in this research can be used for specifying the ALC mechanism in several base languages (e.g., C/C++, Java, and FORTRAN).
引用
收藏
页码:227 / 255
页数:29
相关论文
共 30 条
[1]  
[Anonymous], 2010, Computational Fluid Dynamics
[2]  
[Anonymous], P 3 INT C ASP OR SOF
[3]  
[Anonymous], 2000, Generative Programming: Methods, Tools, and Applications
[4]  
Arora Ritu, 2008, Proceedings of the 2008 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2008), P955
[5]   Developing Scientific Applications Using Generative Programming [J].
Arora, Ritu ;
Bangalore, Purushotham ;
Mernik, Marjan .
2009 ICSE WORKSHOP ON SOFTWARE ENGINEERING FOR COMPUTATIONAL SCIENCE AND ENGINEERING, 2009, :51-58
[6]  
Arora R, 2008, LECT NOTES COMPUT SC, V5375, P26
[7]   DESIGN MAINTENANCE SYSTEMS [J].
BAXTER, ID .
COMMUNICATIONS OF THE ACM, 1992, 35 (04) :73-89
[8]  
BRONEVETSKY G, 2004, ASPLOS 11, P235
[9]  
BRONEVETSKY G, 2003, S PRINC PRACT PAR PR, P84
[10]   Compiler-enhanced incremental checkpointing [J].
Bronevetsky, Greg ;
Marques, Daniel ;
Pingali, Keshav ;
Rugina, Radu .
LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2008, 5234 :1-+