A strategy for scheduling reduce task based on intermediate data locality of the MapReduce

被引:0
作者
Fengjun Shang
Xuanling Chen
Chenyun Yan
机构
[1] Chongqing University of Posts and Telecommunications,Institute of Computer Network Engineering
来源
Cluster Computing | 2017年 / 20卷
关键词
Hadoop; Task scheduling; Data locality; Bandwidth savings;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system. In order to save the network bandwidth resources in Hadoop cluster environment and improve the performance of Hadoop system, a ReduceTask scheduling strategy that based on data-locality is improved. In MapReduce stage, there are two main data streams in cluster network, they are slow task migration and remote copies of data. The two overlapping burst data transfer can easily become bottlenecks of the cluster network. To reduce the amount of remote copies of data, combining with data-locality, we establish a minimum network resource consumption model (MNRC). MNRC is used to calculate the network resources consumption of ReduceTask. Based on this model, we design a delay priority scheduling policy for the ReduceTask which is based on the cost of network resource consumption. Finally, MNRC is verified by simulation experiments. Evaluation results show that MNRC outperforms the saving cluster network resource by an average of 7.5% in heterogeneous.
引用
收藏
页码:2821 / 2831
页数:10
相关论文
共 35 条
[1]  
Landset S(2015)A survey of open source tools for machine learning with big data in the Hadoop ecosystem J. Big Data 2 2-11
[2]  
Khoshgoftaar TM(2016)FiDoop-DP: data partitioning in frequent itemset mining on hadoop clusters IEEE Trans. Parallel Distrib. Syst. 28 101-114
[3]  
Richter AN(2013)MapReduce with communication overlap (MaRCO) J. Parallel Distrib. Comput. 73 608-620
[4]  
Xun Y(2015)Node-capability-aimed data distribution strategy in heterogeneous Hadoop cluster J. Chin. Comput. Syst. 01 83-88
[5]  
Zhang J(2015)A reduce task scheduler for MapReduce with minimum transmission cost based on sampling evaluation Int. J. Database Theory Appl. 8 1-10
[6]  
Qin X(2016)Improving performance of heterogeneous mapreduce clusters with adaptive task tuning IEEE Trans. Parallel Distrib. Syst. 99 1-1
[7]  
Zhao X(2015)LIBRA: lightweight data skew mitigation in MapReduce IEEE Trans. Parallel Distrib. Syst. 26 2520-2533
[8]  
Ahmad F(2015)OFScheduler: a dynamic network optimizer for MapReduce in heterogeneous cluster Int. J. Parallel Program. 43 472-488
[9]  
Lee S(2014)Dache: a data aware caching for big-data applications using the MapReduce framework Tsinghua Sci. Technol. 19 39-50
[10]  
Thottethodi M(2014)Large-scale deep belief nets with mapreduce IEEE Access 2 395-403