Graph partition-based data and task co-scheduling of scientific workflow in geo-distributed datacenters

被引：8

作者：

Zhang, Jinghui ^{[1
]}

Chen, Jian ^{[1
]}

Zhan, Jun ^{[1
]}

Jin, Jiahui ^{[1
]}

Song, Aibo ^{[1
]}

机构：

[1] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2019年 / 31卷 / 24期

关键词：

data transfer; graph partition; hybrid genetic algorithm; scientific workflow scheduling; DEDICATED HETEROGENEOUS MULTICLUSTER; DATA PLACEMENT; CLOUD; EXECUTION; STRATEGY; STORAGE;

D O I：

10.1002/cpe.5245

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Most large-scale scientific workflows take place in multiple collaborative datacenters for access to community-wide resources, while adhering to each datacenter's non-uniform resource limits. However, moving both initial input datasets with predetermined locations and intermediate datasets needing placement decisions across geo-distributed datacenters hinders efficient execution of large-scale data-intensive scientific workflows. Thus, scientific workflow's data and task co-scheduling deal with situations such as pre-placed initial input datasets, placement of intermediate datasets and each datacenter's non-uniform computation and storage constraint, while minimizing the cross-datacenter data transfer. Since this scheduling problem is known to be NP-hard, here, we propose a novel approach, based on the multilevel graph coarsening and uncoarsening framework, together with a specialized hybrid genetic algorithm having distinctive graph partition driven features of repair and local improvement, for scheduling data-intensive scientific workflows in geo-distributed datacenters and optimizing the cross-datacenter data transfer volume. Extensive simulations, based on four real-world workflow traces, show that our algorithm significantly reduces the overall geo-distributed data transfer and demonstrate its effectiveness.

引用

页数：19

共 43 条

[1]

Agarwal Sharad., 2010, NSDI

[2]

[Anonymous], 1995, Technical Report

[3]

[Anonymous], 2011, Encyclopedia of Parallel Computing

[4] Network-aware embedding of virtual machine clusters onto federated cloud infrastructure [J].

Aral, Atakan ;

Ovatman, Tolga .

JOURNAL OF SYSTEMS AND SOFTWARE, 2016, 120 :89-104

[5]

Bharathi S, 2008, 2008 3 WORKSH WORKFL, P1

[6]

Catalyurek U.V., 2011, P 4 INT WORKSH DAT I

[7] Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication [J].

Çatalyürek, ÜV ;

Aykanat, C .

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1999, 10 (07) :673-693

[8] Scheduling for Workflows with Security-Sensitive Intermediate Data by Selective Tasks Duplication in Clouds [J].

Chen, Huangke ;

Zhu, Xiaomin ;

Qiu, Dishan ;

Liu, Ling ;

Du, Zhihui .

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (09) :2674-2688

[9]

Chen J, 2017, 5 INT C ADV CLOUD BI

[10]

da Silva R, 2016, P 11 WORKSH WORKFL S

← 1 2 3 4 5 →