Toward efficient execution of data-intensive workflows

被引:6
作者
Sukhoroslov, Oleg [1 ]
机构
[1] Russian Acad Sci, Inst Informat Transmiss Problems, Moscow, Russia
基金
俄罗斯科学基金会;
关键词
Workflows; Data-intensive computing; Task scheduling; Data management; Simulation; CLOUD; ALGORITHMS;
D O I
10.1007/s11227-020-03612-4
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Workflows that consume and produce large amounts of data are being widely used in modern scientific computing and data processing pipelines. Scheduling of data-intensive workflows requires a careful management of data transfers between tasks, since network contention can significantly impact the workflow execution time. The paper presents and evaluates several scheduling algorithms, data transfer strategies and optimizations aimed at efficient execution of data-intensive workflows. The studied approaches reduce or completely avoid network contention by explicit scheduling of data transfers and incorporate several optimizations, such as data caching, chunked and peer-to-peer data transfers. The results of experimental study demonstrate that the relative performance of different approaches depends on the workflow properties, data staging strategy and system configuration. The proposed CAS-L1 heuristic with additional data transfer optimizations achieves the best results.
引用
收藏
页码:7989 / 8012
页数:24
相关论文
共 25 条
[1]   A task scheduling algorithm for arbitrarily-connected processors with awareness of link contention [J].
Alkaya, Ali Fuat ;
Topcuoglu, Haluk Rahmi .
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2006, 9 (04) :417-431
[2]  
Bharathi S, 2008, 2008 THIRD WORKSHOP ON WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE (WORKS 2008), P11
[3]   DAG Scheduling Using a Lookahead Variant of the Heterogeneous Earliest Finish Time Algorithm [J].
Bittencourt, Luiz F. ;
Sakellariou, Rizos ;
Madeira, Edmundo R. M. .
PROCEEDINGS OF THE 18TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2010, :27-34
[4]   Storage-aware Algorithms for Scheduling of Workflow Ensembles in Clouds [J].
Bryk, Piotr ;
Malawski, Maciej ;
Juve, Gideon ;
Deelman, Ewa .
JOURNAL OF GRID COMPUTING, 2016, 14 (02) :359-378
[5]   Versatile, scalable, and accurate simulation of distributed applications and platforms [J].
Casanova, Henri ;
Giersch, Arnaud ;
Legrand, Arnaud ;
Quinson, Martin ;
Suter, Frederic .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2014, 74 (10) :2899-2917
[6]  
Catalyurek U.V., 2011, P 4 INT WORKSHOP DAT, P45
[7]  
Filgueira R, 2016, PROCEEDINGS OF 7TH INTERNATIONAL WORKSHOP ON DATA-INTENSIVE COMPUTING IN THE CLOUDS (DATACLOUD 2016), P1, DOI [10.1109/DataCloud.2016.4, 10.1109/DataCloud.2016.004]
[8]   Characterizing and profiling scientific workflows [J].
Juve, Gideon ;
Chervenak, Ann ;
Deelman, Ewa ;
Bharathi, Shishir ;
Mehta, Gaurang ;
Vahi, Karan .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2013, 29 (03) :682-692
[9]   A Survey of Data-Intensive Scientific Workflow Management [J].
Liu, Ji ;
Pacitti, Esther ;
Valduriez, Patrick ;
Mattoso, Marta .
JOURNAL OF GRID COMPUTING, 2015, 13 (04) :457-493
[10]   A data placement strategy for scientific workflow in hybrid cloud [J].
Liu, Zhanghui ;
Xiang, Tao ;
Lin, Bing ;
Ye, Xinshu ;
Wang, Haijiang ;
Zhang, Ying ;
Chen, Xing .
PROCEEDINGS 2018 IEEE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2018, :556-563