Graph partition-based data and task co-scheduling of scientific workflow in geo-distributed datacenters

被引:8
作者
Zhang, Jinghui [1 ]
Chen, Jian [1 ]
Zhan, Jun [1 ]
Jin, Jiahui [1 ]
Song, Aibo [1 ]
机构
[1] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China
关键词
data transfer; graph partition; hybrid genetic algorithm; scientific workflow scheduling; DEDICATED HETEROGENEOUS MULTICLUSTER; DATA PLACEMENT; CLOUD; EXECUTION; STRATEGY; STORAGE;
D O I
10.1002/cpe.5245
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Most large-scale scientific workflows take place in multiple collaborative datacenters for access to community-wide resources, while adhering to each datacenter's non-uniform resource limits. However, moving both initial input datasets with predetermined locations and intermediate datasets needing placement decisions across geo-distributed datacenters hinders efficient execution of large-scale data-intensive scientific workflows. Thus, scientific workflow's data and task co-scheduling deal with situations such as pre-placed initial input datasets, placement of intermediate datasets and each datacenter's non-uniform computation and storage constraint, while minimizing the cross-datacenter data transfer. Since this scheduling problem is known to be NP-hard, here, we propose a novel approach, based on the multilevel graph coarsening and uncoarsening framework, together with a specialized hybrid genetic algorithm having distinctive graph partition driven features of repair and local improvement, for scheduling data-intensive scientific workflows in geo-distributed datacenters and optimizing the cross-datacenter data transfer volume. Extensive simulations, based on four real-world workflow traces, show that our algorithm significantly reduces the overall geo-distributed data transfer and demonstrate its effectiveness.
引用
收藏
页数:19
相关论文
共 43 条
[1]  
Agarwal Sharad., 2010, NSDI
[2]  
[Anonymous], 1995, Technical Report
[3]  
[Anonymous], 2011, Encyclopedia of Parallel Computing
[4]   Network-aware embedding of virtual machine clusters onto federated cloud infrastructure [J].
Aral, Atakan ;
Ovatman, Tolga .
JOURNAL OF SYSTEMS AND SOFTWARE, 2016, 120 :89-104
[5]  
Bharathi S, 2008, 2008 3 WORKSH WORKFL, P1
[6]  
Catalyurek U.V., 2011, P 4 INT WORKSH DAT I
[7]   Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication [J].
Çatalyürek, ÜV ;
Aykanat, C .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1999, 10 (07) :673-693
[8]   Scheduling for Workflows with Security-Sensitive Intermediate Data by Selective Tasks Duplication in Clouds [J].
Chen, Huangke ;
Zhu, Xiaomin ;
Qiu, Dishan ;
Liu, Ling ;
Du, Zhihui .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (09) :2674-2688
[9]  
Chen J, 2017, 5 INT C ADV CLOUD BI
[10]  
da Silva R, 2016, P 11 WORKSH WORKFL S