Data locality-aware and QoS-aware dynamic cloud workflow scheduling in Hadoop for heterogeneous environment

被引:4
作者
Ding, Fan [1 ]
Ma, Minjin [2 ]
机构
[1] Lanzhou Univ Technol, Coll Comp & Commun Engn, 287 Langongping Rd, Lanzhou 730050, Gansu, Peoples R China
[2] Lanzhou Univ, Coll Atmospher Sci, 222 South Tianshui Rd, Lanzhou 730000, Gansu, Peoples R China
关键词
data locality; Hadoop MapReduce; heterogeneous; workflow scheduling; quality of service; QoS; big data;
D O I
10.1504/IJWGS.2023.129338
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hadoop has become a popular data-parallel computing framework for data-intensive scientific applications in recent years. Most scientific applications employ workflows to portray procedures and dependencies between jobs. However, the current default scheduling policy in Hadoop does not take data locality into account. The movement of data among virtual machines (VMs) produces latency in workflow scheduling. In addition, the heterogeneous and dynamics of cloud resources cannot satisfy the user's demand for quality of service (QoS) in static workflow scheduling. Hence, we propose a data locality-aware and QoS-aware dynamic cloud workflow scheduling algorithm (DQ-DCWS) based on dynamic programming. The algorithm balances data locality and delays by grouping nodes that hold tasks correlated with data blocks. We consider five QoS factors and normalise them as a path optimisation issue to realise maximum QoS. DQ-DCWS is implemented and validated by running Montage workflow on real Hadoop clusters which are deployed on Amazon EC2.
引用
收藏
页码:113 / 135
页数:24
相关论文
共 66 条
[1]   The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update [J].
Afgan, Enis ;
Baker, Dannon ;
Batut, Berenice ;
van den Beek, Marius ;
Bouvier, Dave ;
Cech, Martin ;
Chilton, John ;
Clements, Dave ;
Coraor, Nate ;
Gruening, Bjoern A. ;
Guerler, Aysam ;
Hillman-Jackson, Jennifer ;
Hiltemann, Saskia ;
Jalili, Vahid ;
Rasche, Helena ;
Soranzo, Nicola ;
Goecks, Jeremy ;
Taylor, James ;
Nekrutenko, Anton ;
Blankenberg, Daniel .
NUCLEIC ACIDS RESEARCH, 2018, 46 (W1) :W537-W544
[2]  
Alwidian J.A., 2019, Mod Appl Sci, V13, P38, DOI DOI 10.5539/MAS.V13N7P38
[3]  
Amazon.com, 2022, EL COMP CLOUD EC2
[4]  
APACHE, 2022, FAIR SCHED
[5]   Multi-QoS constrained and Profit-aware scheduling approach for concurrent workflows on heterogeneous systems [J].
Arabnejad, Hamid ;
Barbosa, Jorge G. .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 68 :211-221
[6]   Novel data-placement scheme for improving the data locality of Hadoop in heterogeneous environments [J].
Bae, Minho ;
Yeo, Sangho ;
Park, Gyudong ;
Oh, Sangyoon .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (18)
[7]  
Borthakur D., 2008, Hadoop apache project, V53, P2
[8]  
Chen WN, 2012, IEEE SYS MAN CYBERN, P773, DOI 10.1109/ICSMC.2012.6377821
[9]   An Enhanced Data-Locality-Aware Task Scheduling Algorithm for Hadoop Applications [J].
Choi, Dongjoo ;
Jeon, Myunghoon ;
Kim, Namgi ;
Lee, Byoung-Dai .
IEEE SYSTEMS JOURNAL, 2018, 12 (04) :3346-3357
[10]  
Davis DAP, 2017, IEEE GLOB COMM CONF