Graph partition-based data and task co-scheduling of scientific workflow in geo-distributed datacenters

被引:7
作者
Zhang, Jinghui [1 ]
Chen, Jian [1 ]
Zhan, Jun [1 ]
Jin, Jiahui [1 ]
Song, Aibo [1 ]
机构
[1] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China
关键词
data transfer; graph partition; hybrid genetic algorithm; scientific workflow scheduling; DEDICATED HETEROGENEOUS MULTICLUSTER; DATA PLACEMENT; CLOUD; EXECUTION; STRATEGY; STORAGE;
D O I
10.1002/cpe.5245
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Most large-scale scientific workflows take place in multiple collaborative datacenters for access to community-wide resources, while adhering to each datacenter's non-uniform resource limits. However, moving both initial input datasets with predetermined locations and intermediate datasets needing placement decisions across geo-distributed datacenters hinders efficient execution of large-scale data-intensive scientific workflows. Thus, scientific workflow's data and task co-scheduling deal with situations such as pre-placed initial input datasets, placement of intermediate datasets and each datacenter's non-uniform computation and storage constraint, while minimizing the cross-datacenter data transfer. Since this scheduling problem is known to be NP-hard, here, we propose a novel approach, based on the multilevel graph coarsening and uncoarsening framework, together with a specialized hybrid genetic algorithm having distinctive graph partition driven features of repair and local improvement, for scheduling data-intensive scientific workflows in geo-distributed datacenters and optimizing the cross-datacenter data transfer volume. Extensive simulations, based on four real-world workflow traces, show that our algorithm significantly reduces the overall geo-distributed data transfer and demonstrate its effectiveness.
引用
收藏
页数:19
相关论文
共 43 条
  • [1] Agarwal Sharad., 2010, NSDI
  • [2] [Anonymous], 1995, Technical Report
  • [3] [Anonymous], 2011, Encyclopedia of Parallel Computing
  • [4] Network-aware embedding of virtual machine clusters onto federated cloud infrastructure
    Aral, Atakan
    Ovatman, Tolga
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2016, 120 : 89 - 104
  • [5] Bharathi S, 2008, 2008 3 WORKSH WORKFL, P1
  • [6] Catalyurek U.V., 2011, P 4 INT WORKSH DAT I
  • [7] Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication
    Çatalyürek, ÜV
    Aykanat, C
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1999, 10 (07) : 673 - 693
  • [8] Scheduling for Workflows with Security-Sensitive Intermediate Data by Selective Tasks Duplication in Clouds
    Chen, Huangke
    Zhu, Xiaomin
    Qiu, Dishan
    Liu, Ling
    Du, Zhihui
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (09) : 2674 - 2688
  • [9] Chen J, 2017, 5 INT C ADV CLOUD BI
  • [10] da Silva R, 2016, P 11 WORKSH WORKFL S