On Traffic-Aware Partition and Aggregation in MapReduce for Big Data Applications

被引:34
作者
Ke, Huan [1 ]
Li, Peng [1 ]
Guo, Song [1 ]
Guo, Minyi [2 ]
机构
[1] Univ Aizu, Sch Comp Sci & Engn, Aizu Wakamatsu 8580, Japan
[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
MapReduce; partition; aggregation; big data; lagrangian decomposition;
D O I
10.1109/TPDS.2015.2419671
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The MapReduce programming model simplifies large-scale data processing on commodity cluster by exploiting parallel map tasks and reduce tasks. Although many efforts have been made to improve the performance of MapReduce jobs, they ignore the network traffic generated in the shuffle phase, which plays a critical role in performance enhancement. Traditionally, a hash function is used to partition intermediate data among reduce tasks, which, however, is not traffic-efficient because network topology and data size associated with each key are not taken into consideration. In this paper, we study to reduce network traffic cost for a MapReduce job by designing a novel intermediate data partition scheme. Furthermore, we jointly consider the aggregator placement problem, where each aggregator can reduce merged traffic from multiple map tasks. A decomposition-based distributed algorithm is proposed to deal with the large-scale optimization problem for big data application and an online algorithm is also designed to adjust data partition and aggregation in a dynamic manner. Finally, extensive simulation results demonstrate that our proposals can significantly reduce network traffic cost under both offline and online cases.
引用
收藏
页码:818 / 828
页数:11
相关论文
共 25 条
[21]  
Wang WN, 2013, IEEE INFOCOM SER, P1609
[22]   Engagement of Facilities Management in Design Stage through BIM: Framework and a Case Study [J].
Wang, Ying ;
Wang, Xiangyu ;
Wang, Jun ;
Yung, Ping ;
Jun, Guo .
ADVANCES IN CIVIL ENGINEERING, 2013, 2013
[23]  
Wei Yan, 2013, 2013 IEEE International Conference on Big Data, P156, DOI 10.1109/BigData.2013.6691568
[24]  
Yang H.-c., 2007, SIGMOD/PODS'07: 34th ACM SIGMOD International Conference on Management of Data, P1029
[25]  
Yu W, 2013, IEEE CONF COMM NETW, P488, DOI 10.1109/CNS.2013.6682765