A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds

被引:0
作者
Daniel de Oliveira
Kary A. C. S. Ocaña
Fernanda Baião
Marta Mattoso
机构
[1] Federal University of Rio de Janeiro - COPPE/UFRJ,
[2] Federal University of the State of Rio de Janeiro – UNIRIO,undefined
来源
Journal of Grid Computing | 2012年 / 10卷
关键词
Cloud computing; Scientific workflow; Scientific experiment; Provenance;
D O I
暂无
中图分类号
学科分类号
摘要
In the last years, scientific workflows have emerged as a fundamental abstraction for structuring and executing scientific experiments in computational environments. Scientific workflows are becoming increasingly complex and more demanding in terms of computational resources, thus requiring the usage of parallel techniques and high performance computing (HPC) environments. Meanwhile, clouds have emerged as a new paradigm where resources are virtualized and provided on demand. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. Although the initial focus of clouds was to provide high throughput computing, clouds are already being used to provide an HPC environment where elastic resources can be instantiated on demand during the course of a scientific workflow. However, this model also raises many open, yet important, challenges such as scheduling workflow activities. Scheduling parallel scientific workflows in the cloud is a very complex task since we have to take into account many different criteria and to explore the elasticity characteristic for optimizing workflow execution. In this paper, we introduce an adaptive scheduling heuristic for parallel execution of scientific workflows in the cloud that is based on three criteria: total execution time (makespan), reliability and financial cost. Besides scheduling workflow activities based on a 3-objective cost model, this approach also scales resources up and down according to the restrictions imposed by scientists before workflow execution. This tuning is based on provenance data captured and queried at runtime. We conducted a thorough validation of our approach using a real bioinformatics workflow. The experiments were performed in SciCumulus, a cloud workflow engine for managing scientific workflow execution.
引用
收藏
页码:521 / 552
页数:31
相关论文
共 105 条
[1]  
Vaquero LM(2009)A break in the clouds: towards a cloud definition SIGCOMM Comput. Commun. Rev. 39 50-55
[2]  
Rodero-Merino L(2010)Towards supporting the life cycle of large-scale scientific experiments IJBPIM 5 79-92
[3]  
Caceres J(2009)Workflows and e-Science: an overview of workflow system features and capabilities Future Gener. Comput. Syst. 25 528-540
[4]  
Lindner M(2008)Provenance for computational tasks: a survey Comput. Sci. Eng. 10 11-21
[5]  
Mattoso M(2009)Montage: a Grid portal and software toolkit for science-grade astronomical image mosaicking IJCSE 4 73-87
[6]  
Werner C(2006)Scheduling scientific workflow applications with deadline and budget constraints using genetic algorithms Sci. Program. 14 217-230
[7]  
Travassos GH(2011)An efficient weighted bi-objective scheduling algorithm for heterogeneous systems Parallel Comput. 37 349-364
[8]  
Braganholo V(2005)A dynamic and reliability-driven scheduling algorithm for parallel real-time jobs executing on heterogeneous clusters J. Parallel Distrib. Comput. 65 885-900
[9]  
Murta L(2010)Adaptive virtual partitioning for OLAP query processing in a database cluster JIDM 1 75-88
[10]  
Ogasawara E(2008)Parallel query processing for OLAP in Grids CCPE 20 2039-2048