Risk-resilient heuristics and genetic algorithms for security-assured grid job scheduling

被引:124
作者
Song, SS
Hwang, K
Kwok, YK
机构
[1] Univ So Calif, Internet & Grid Res Lab, Los Angeles, CA 90089 USA
[2] Univ So Calif, Dept EE Syst, Los Angeles, CA 90089 USA
[3] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Hong Kong, Peoples R China
基金
美国国家科学基金会;
关键词
grid computing; job scheduling heuristics; genetic algorithm; replication scheduling; risk resilience; NAS and PSA benchmarks; performance metrics; distributed supercomputing;
D O I
10.1109/TC.2006.89
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In scheduling a large number of user jobs for parallel execution on an open-resource Grid system, the jobs are subject to system failures or delays caused by infected hardware, software vulnerability, and distrusted security policy. This paper models the risk and insecure conditions in Grid job scheduling. Three risk-resilient strategies, preemptive, replication, and delay-tolerant, are developed to provide security assurance. We propose six risk-resilient scheduling algorithms to assure secure Grid job execution under different risky conditions. We report the simulated Grid performances of these new Grid job scheduling algorithms under the NAS and PSA workloads. The relative performance is measured by the total job makespan, Grid resource utilization, job failure rate, slowdown ratio, replication overhead, etc. In addition to extending from known scheduling heuristics, we developed a new space-time genetic algorithm (STGA) based on faster searching and protected chromosome formation. Our simulation results suggest that, in a wide-area Grid environment, it is more resilient for the global job scheduler to tolerate some job delays instead of resorting to preemption or replication or taking a risk on unreliable resources allocated. We find that delay-tolerant Min-Min and STGA job scheduling have 13-23 percent higher performance than using risky or preemptive or replicated algorithms. The resource overheads for replicated job scheduling are kept at a low 15 percent. The delayed job execution is optimized with a delay factor, which is 20 percent of the total makespan. A Kiviat graph is proposed for demonstrating the quality of Grid computing services. These risk-resilient job scheduling schemes can upgrade Grid performance significantly at only a moderate increase in extra resources or scheduling delays in a risky Grid computing environment.
引用
收藏
页码:703 / 719
页数:17
相关论文
共 45 条
[1]  
ABAWAJY JH, 2004, P IEEE INT PAR DISTR
[2]  
[Anonymous], CLUSTER COMPUTING
[3]  
Atallah MJ, 2001, ADV COMPUT, V54, P215
[4]  
AZZEDIN F, 2002, P INT C PAR PROC AUG
[5]   Irnproving scheduling of tasks in a heterogeneous environment [J].
Bajaj, R ;
Agrawal, DP .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2004, 15 (02) :107-118
[6]   An improved duplication strategy for scheduling precedence constrained graphs in multiprocessor systems [J].
Bansal, S ;
Kumar, P ;
Singh, K .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2003, 14 (06) :533-544
[7]   Adaptive computing on the grid using AppLeS [J].
Berman, F ;
Wolski, R ;
Casanova, H ;
Cirne, W ;
Dail, H ;
Faerman, M ;
Figueira, S ;
Hayes, J ;
Obertelli, G ;
Schopf, J ;
Shao, G ;
Smallen, S ;
Spring, N ;
Su, A ;
Zagorodnov, D .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2003, 14 (04) :369-382
[8]  
Berman F., 2003, GRID COMPUTING MAKIN
[9]   A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems [J].
Braun, TD ;
Siegel, HJ ;
Beck, N ;
Bölöni, LL ;
Maheswaran, M ;
Reuther, AI ;
Robertson, JP ;
Theys, MD ;
Yao, B ;
Hensgen, D ;
Freund, RF .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2001, 61 (06) :810-837
[10]  
BUYYA R, 2002, P INT C PAR DISTR PR