On Traffic-Aware Partition and Aggregation in MapReduce for Big Data Applications

被引:34
作者
Ke, Huan [1 ]
Li, Peng [1 ]
Guo, Song [1 ]
Guo, Minyi [2 ]
机构
[1] Univ Aizu, Sch Comp Sci & Engn, Aizu Wakamatsu 8580, Japan
[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
MapReduce; partition; aggregation; big data; lagrangian decomposition;
D O I
10.1109/TPDS.2015.2419671
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The MapReduce programming model simplifies large-scale data processing on commodity cluster by exploiting parallel map tasks and reduce tasks. Although many efforts have been made to improve the performance of MapReduce jobs, they ignore the network traffic generated in the shuffle phase, which plays a critical role in performance enhancement. Traditionally, a hash function is used to partition intermediate data among reduce tasks, which, however, is not traffic-efficient because network topology and data size associated with each key are not taken into consideration. In this paper, we study to reduce network traffic cost for a MapReduce job by designing a novel intermediate data partition scheme. Furthermore, we jointly consider the aggregator placement problem, where each aggregator can reduce merged traffic from multiple map tasks. A decomposition-based distributed algorithm is proposed to deal with the large-scale optimization problem for big data application and an online algorithm is also designed to adjust data partition and aggregation in a dynamic manner. Finally, extensive simulation results demonstrate that our proposals can significantly reduce network traffic cost under both offline and online cases.
引用
收藏
页码:818 / 828
页数:11
相关论文
共 25 条
[1]   MapReduce with communication overlap (MaRCO) [J].
Ahmad, Faraz ;
Lee, Seyong ;
Thottethodi, Mithuna ;
Vijaykumar, T. N. .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (05) :608-620
[2]  
[Anonymous], 2010, SYNTHESIS LECT HUMAN, DOI DOI 10.2200/S00274ED1V01Y201006HLT007
[3]  
[Anonymous], 2008, IRPTR0805
[4]  
[Anonymous], 2012, P NSDI 12
[5]  
[Anonymous], 2012, PROC 9 USENIX C NETW
[6]  
[Anonymous], 2009, Hadoop: The Definitive Guide
[7]  
[Anonymous], 2010, NSDI
[8]  
Chen FF, 2012, IEEE INFOCOM SER, P1143, DOI 10.1109/INFCOM.2012.6195473
[9]  
Condie T., 2010, SIGMOD, P1115, DOI DOI 10.1145/1807167.1807295
[10]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137