Multi-objective scheduling of MapReduce jobs in big data processing

被引:0
作者
Ibrahim Abaker Targio Hashem
Nor Badrul Anuar
Mohsen Marjani
Abdullah Gani
Arun Kumar Sangaiah
Adewole Kayode Sakariyah
机构
[1] University of Malaya,Faculty of Computer Science and Information Technology
[2] VIT University,School of Computing Science and Engineering
来源
Multimedia Tools and Applications | 2018年 / 77卷
关键词
Hadoop; MapReduce; Cloud computing; Big data; Scheduling algorithms;
D O I
暂无
中图分类号
学科分类号
摘要
Data generation has increased drastically over the past few years due to the rapid development of Internet-based technologies. This period has been called the big data era. Big data offer an emerging paradigm shift in data exploration and utilization. The MapReduce computational paradigm is a well-known framework and is considered the main enabler for the distributed and scalable processing of a large amount of data. However, despite recent efforts toward improving the performance of MapReduce, scheduling MapReduce jobs across multiple nodes has been considered a multi-objective optimization problem. This problem can become increasingly complex when virtualized clusters in cloud computing are used to execute a large number of tasks. This study aims to optimize MapReduce job scheduling based on the completion time and cost of cloud service models. First, the problem is formulated as a multi-objective model. The model consists of two objective functions, namely, (i) completion time and (ii) cost minimization. Second, a scheduling algorithm using earliest finish time scheduling that considers resource allocation and job scheduling in the cloud is proposed. Lastly, experimental results show that the proposed scheduler exhibits better performance than other well-known schedulers, such as FIFO and Fair.
引用
收藏
页码:9979 / 9994
页数:15
相关论文
共 66 条
[1]  
Abouzeid A(2009)HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads Proc VLDB Endowment 2 922-933
[2]  
Bajda-Pawlikowski K(2010)A view of cloud computing Commun ACM 53 50-58
[3]  
Abadi D(2011)HCOC: a cost optimization algorithm for workflow scheduling in hybrid clouds J Internet Serv Appl 2 207-227
[4]  
Silberschatz A(2008)MapReduce: simplified data processing on large clusters Commun ACM 51 107-113
[5]  
Rasin A(2010)MapReduce: a flexible data processing tool Commun ACM 53 72-77
[6]  
Armbrust M(2014)A survey of large-scale analytical query processing in MapReduce VLDB J 23 355-380
[7]  
Fox A(2014)Multi-objective workflow scheduling in amazon EC2 Clust Comput 17 169-189
[8]  
Griffith R(2015)The rise of “big data” on cloud computing: review and open research issues Inf Syst 47 98-115
[9]  
Joseph AD(2013)A survey on resource allocation in high performance distributed computing systems Parallel Comput 39 709-736
[10]  
Katz R(2015)Big data and science: myths and reality Big Data Res 2 49-52