Rescheduling for reliable job completion with the support of clouds

被引:45
作者
Lee, Young Choon [1 ]
Zomaya, Albert Y. [1 ]
机构
[1] Univ Sydney, Ctr Distributed & High Performance Comp, Sch Informat Technol, Sydney, NSW 2006, Australia
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2010年 / 26卷 / 08期
基金
澳大利亚研究理事会;
关键词
INDEPENDENT TASKS;
D O I
10.1016/j.future.2010.02.010
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A major performance issue in large-scale decentralized distributed systems, such as grids, is how to ensure that jobs finish their execution within the estimated completion times in the presence of resource performance fluctuations. Previously, several techniques including advance reservation, rescheduling and migration have been adopted to resolve/relieve this issue; however, they have some non-negligent practicality hurdles. The use of clouds may be an attractive alternative, since resources in clouds are much more reliable than those in grids. This paper investigates the effectiveness of rescheduling using cloud resources to increase the reliability of job completion. Specifically, schedules are initially generated using grid resources, and cloud resources (relatively costlier) are used only for rescheduling to cope with a delay in job completion. A job in our study refers to a bag-of-tasks (BoT) application that consists of a large number of independent tasks; this job model is common in many science and engineering applications. We have devised a novel rescheduling technique, called rescheduling using clouds for reliable completion (RC2) and applied it to three well-known existing heuristics. Our experimental results reveal that RC2 significantly reduces delay in job completion. (c) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:1192 / 1199
页数:8
相关论文
共 21 条
[1]   Measuring the robustness of a resource allocation [J].
Ali, S ;
Maciejewski, AA ;
Siegel, HJ ;
Kim, JK .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2004, 15 (07) :630-641
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   SETI@home - An experiment in public-resource computing [J].
Anderson, DP ;
Cobb, J ;
Korpela, E ;
Lebofsky, M ;
Werthimer, D .
COMMUNICATIONS OF THE ACM, 2002, 45 (11) :56-61
[4]   Simgrid: a toolkit for the simulation of application scheduling [J].
Casanova, H .
FIRST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, PROCEEDINGS, 2001, :430-437
[5]  
Casanova H., 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556), P349, DOI 10.1109/HCW.2000.843757
[6]   Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids [J].
Chtepen, Maria ;
Claeys, Filip H. A. ;
Dhoedt, Bart ;
De Turck, Filip ;
Demeester, Piet ;
Vanrolleghem, Peter A. .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2009, 20 (02) :180-190
[7]  
Doar MB, 1996, IEEE GLOBECOM 1996 - GLOBAL INTERNET'96, CONFERENCE RECORD, P86, DOI 10.1109/GLOCOM.1996.586131
[8]  
Grama A., 2003, Introduction to Parallel Computing, V2
[9]   ORGs for scalable, robust, privacy-friendly client cloud computing [J].
Hewitt, Carl .
IEEE INTERNET COMPUTING, 2008, 12 (05) :96-99
[10]  
IBARRA OH, 1977, J ASSOC COMPUT MACH, V24, P280, DOI DOI 10.1145/322003.322011