A Data and Task Co-Scheduling Algorithm for Scientific Cloud Workflows

被引:15
作者
Deng, Kefeng [1 ]
Ren, Kaijun [1 ]
Zhu, Min [1 ]
Song, Junqiang [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Cloud computing; scientific workflow; co-scheduling; data placement; task scheduling; DATA PLACEMENT; STRATEGY;
D O I
10.1109/TCC.2015.2511745
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud computing has emerged as a promising computational infrastructure for cost-efficient workflow execution by provisioning on-demand resources in a pay-as-you-go manner. While scientific workflows require accessing community-wide resources, they usually need to be performed in collaborative cloud environments composed of multiple datacenters. Although such environments facilitate scientific collaboration, the movements of input and intermediate datasets across geographically distributed datacenters may cause intolerable latency that would hinder efficient execution of large-scale data-intensive scientific workflows. To address the problem, in this article we propose a novel multi-level K-cut graph partitioning algorithm to minimize the volume of data transfer across datacenters while satisfying load balancing and fixed data constraints. The algorithm first contracts the fixed input datasets in the same datacenter and their consuming tasks, and coarsens the contracted graph to a predefined scale in a level-by-level manner. Then, a K-cut algorithm is used to partition the resulted graph into K parts such that the cut size is minimized. After that, the partitioned graph is projected back to the original workflow graph, during which the load balancing constraint is maintained. We evaluate our algorithm using three real-world workflow applications and the results demonstrate that the proposed algorithm outperforms other state-of-the-art algorithms.
引用
收藏
页码:349 / 362
页数:14
相关论文
共 51 条
  • [1] Cost-Driven Scheduling of Grid Workflows Using Partial Critical Paths
    Abrishami, Saeid
    Naghibzadeh, Mahmoud
    Epema, Dick H. J.
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2012, 23 (08) : 1400 - 1414
  • [2] [Anonymous], P INT C PAR PROC ICP
  • [3] Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices
    Aykanat, Cevdet
    Cambazoglu, B. Barla
    Ucar, Bora
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2008, 68 (05) : 609 - 625
  • [4] CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms
    Calheiros, Rodrigo N.
    Ranjan, Rajiv
    Beloglazov, Anton
    De Rose, Cesar A. F.
    Buyya, Rajkumar
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 2011, 41 (01) : 23 - 50
  • [5] An improved approximation algorithm for multiway cut
    Calinescu, G
    Karloff, H
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2000, 60 (03) : 564 - 574
  • [6] Catalyurek U.V., 2011, P 4 INT WORKSHOP DAT, P45
  • [7] Catalyurek Umit V., 2001, ACM IEEE C SUP, P28, DOI DOI 10.1145/582034.582062
  • [8] Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication
    Çatalyürek, ÜV
    Aykanat, C
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1999, 10 (07) : 673 - 693
  • [9] Chen J., 2011, P 2011 IEEE ACM 12 I, P34
  • [10] Chervenak A, 2005, 2005 6th International Workshop on Grid Computing (GRID), P1