An optimized MapReduce workflow scheduling algorithm for heterogeneous computing

被引:0
作者
Zhuo Tang
Min Liu
Almoalmi Ammar
Kenli Li
Keqin Li
机构
[1] Hunan University,College of Information Science and Engineering
[2] State University of New York,Department of Computer Science
来源
The Journal of Supercomputing | 2016年 / 72卷
关键词
Hadoop; Heterogeneous cluster; MapReduce; Scheduling; Workflow;
D O I
暂无
中图分类号
学科分类号
摘要
The MapReduce framework is considered to be an effective resolution for huge and parallel data processing. This paper treats a massive data processing workflow as a DAG graph consisting of MapReduce jobs. In a heterogeneous computing environment, the computation speed can be different even on the same slot depending on various jobs. For this problem, this paper proposes an optimized MapReduce workflow scheduling algorithm. This algorithm comprises a job prioritizing phase and a task assignment phase. First, the jobs can be classified as I/O-intensive and computing-intensive, and the priorities of all jobs are computed according to their corresponding types. Then, the suitable slots are allocated for each block, and the MapReduce tasks in the workflow are scheduled with respect to data locality. The experimental results show that the optimized MapReduce workflow scheduling algorithm can improve the performance of task scheduling and the rationality of resources allocation in heterogeneous computing.
引用
收藏
页码:2059 / 2079
页数:20
相关论文
共 65 条
[1]  
Barker A(2009)The circulate architecture: avoiding workflow bottlenecks caused by centralised orchestration Clust Comput 12 221-235
[2]  
Weissman JB(2010)Workflows and extensions to the kepler scientific workflow system to support environmental sensor data access and analysis Ecol Inform 5 42-50
[3]  
Hemert JI(2011)Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms Softw 41 23-50
[4]  
Barseghian D(2005)Pegasus: a framework for mapping complex scientific workflows onto distributed systems Sci Progr 13 219-237
[5]  
Altintas I(2012)A dataflow-based scientific workflow composition framework Serv Comput IEEE Trans 5 45-58
[6]  
Jones M(2009)Appion: an integrated, database-driven pipeline to facilitate em image processing J Struct Biol 166 95-102
[7]  
Crawl D(2009)Scientific workflow design for mere mortals Futur Gener Comput Syst 25 541-551
[8]  
Potter N(2011)An iterative workflow for mining the human intestinal metaproteome BMC Genomics 12 6-274
[9]  
Gallagher J(2002)Performance-effective and low-complexity task scheduling for heterogeneous computing IEEE Trans Parallel Distrib Syst 13 260-20
[10]  
Cornillon P(2010)Flex: a slot allocation scheduling optimizer for mapreduce workloads Middleware 2010 1-undefined