Modeling and Analysis of Grid Service Reliability Considering Fault Recovery

被引:5
作者
Guo, Suchang [1 ]
Huang, Hong-Zhong [1 ]
Liu, Yu [1 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu 611731, Peoples R China
基金
中国国家自然科学基金;
关键词
Grid; Service Reliability; Recoverability; Fault Tolerance; Fault Recovery; PERFORMANCE; SYSTEM;
D O I
10.1007/s00354-009-0114-8
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The extreme complexity of grid system makes it extremely difficult to achieve high service reliability, and this situation is aggravated by the fact that many grid services need to perform time-consuming tasks that may require several days or even months of computation. To improve grid service reliability, this paper studies a fault recovery technique in grid systems and conducts in-depth research on grid reliability modeling and analysis with fault recovery. Grid failures considered in this paper are classified into two categories: unrecoverable failures and recoverable failures. Software reliability is taken into account as well. To make fault recovery more practical, certain constraints on fault recovery are introduced and grid service reliability models under these practical constraints are developed. Numerical examples are presented, and based on the results obtained, the impact of fault recovery as well as that of practical constraints on grid service reliability is discussed.
引用
收藏
页码:345 / 364
页数:20
相关论文
共 29 条
[1]  
Affaan M, 2006, GCC 2005: FIFTH INTERNATIONAL CONFERENCE ON GRID AND COOPERATIVE COMPUTING, PROCEEDINGS, P363
[2]  
Bolosky WJ, 2000, PERF E R SI, V28, P34, DOI 10.1145/345063.339345
[3]  
Bosilca G., 2002, ACMIEEE INT C SUPERC, P1
[4]   Reliability of grid service systems [J].
Dai, Y. S. ;
Xie, M. ;
Poh, K. L. .
COMPUTERS & INDUSTRIAL ENGINEERING, 2006, 50 (1-2) :130-147
[5]   A hierarchical modeling and analysis for grid service reliability [J].
Dai, Yuan-Shun ;
Pan, Yi ;
Zou, Xukai .
IEEE TRANSACTIONS ON COMPUTERS, 2007, 56 (05) :681-691
[6]   Optimal task partition and distribution in grid service system with common cause failures [J].
Dai, Yuan-Shun ;
Levitin, Gregory ;
Wang, Xiaolong .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING THEORY METHODS AND APPLICATIONS, 2007, 23 (02) :209-218
[7]   Reliability and performance of tree-structured grid services [J].
Dai, Yuan-Shun ;
Levitin, Gregory .
IEEE TRANSACTIONS ON RELIABILITY, 2006, 55 (02) :337-349
[8]   A worldwide flock of Condors: Load sharing among workstation clusters [J].
Epema, DHJ ;
Livny, M ;
vanDantzig, R ;
Evers, X ;
Pruyne, J .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING AND ESCIENCE, 1996, 12 (01) :53-65
[9]   The grid: A new infrastructure for 21st century science [J].
Foster, I .
PHYSICS TODAY, 2002, 55 (02) :42-47
[10]   Grid services for distributed system integration [J].
Foster, I ;
Kesselman, C ;
Nick, JM ;
Tuecke, S .
COMPUTER, 2002, 35 (06) :37-46