An Improved Algorithm for Optimizing MapReduce Based on Locality and Overlapping

被引:8
作者
Li, Jianjiang [1 ]
Wang, Jie [1 ]
Lyu, Bin [1 ,2 ]
Wu, Jie [3 ]
Yang, Xiaolei [1 ]
机构
[1] Univ Sci & Technol Beijing, Dept Comp Sci & Technol, Beijing 100083, Peoples R China
[2] Univ Southern Calif, Los Angeles, CA 90089 USA
[3] Temple Univ, Dept Comp & Informat Sci, Philadelphia, PA 19122 USA
基金
国家重点研发计划;
关键词
MapReduce; overlapping; load balance; data locality;
D O I
10.26599/TST.2018.9010115
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
MapReduce is currently the most popular programming model for big data processing, and Hadoop is a well-known MapReduce implementation platform. However, Hadoop jobs suffer from imbalanced workloads during the reduce phase and inefficiently utilize the available computing and network resources. In some cases, these problems lead to serious performance degradation in MapReduce jobs. To resolve these problems, in this paper, we propose two algorithms, the Locality-Based Balanced Schedule (LBBS) and Overlapping-Based Resource Utilization (OBRU), that optimize the Locality-Enhanced Load Balance (LELB) and the Map, Local reduce, Shuffle, and final Reduce (MLSR) phases. The LBBS collects partition information from input data during the map phase and generates balanced schedule plans for the reduce phase. OBRU is responsible for using computing and network resources efficiently by overlapping the local reduce, shuffle, and final reduce phases. Experimental results show that the LBBS and OBRU algorithms yield significant improvements in load balancing. When LBBS and OBRU are applied, job performance increases by 15% from that of models using LELB and MLSR.
引用
收藏
页码:744 / 753
页数:10
相关论文
共 21 条
[1]   MapReduce with communication overlap (MaRCO) [J].
Ahmad, Faraz ;
Lee, Seyong ;
Thottethodi, Mithuna ;
Vijaykumar, T. N. .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (05) :608-620
[2]  
[Anonymous], 2012 IEEE S COMP COM
[3]  
[Anonymous], 44 INT C PAR PROC BE
[4]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[5]   iShuffle: Improving Hadoop Performance with Shuffle-on-Write [J].
Guo, Yanfei ;
Rao, Jia ;
Cheng, Dazhao ;
Zhou, Xiaobo .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (06) :1649-1662
[6]  
Kwon Y., 2011, 5 OPEN CIRRUS SUMMIT
[7]  
Kwon Y., 2010, P 1 ACM S CLOUD COMP, P75
[8]   SkewTune in Action: Mitigating Skew in MapReduce Applications [J].
Kwon, YongChul ;
Balazinska, Magdalena ;
Howe, Bill ;
Rolia, Jerome .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12) :1934-1937
[9]  
Liu J., 2010, 2010 3 IEEE INT C BR
[10]  
Mohandas N., 2011, ADV COMPUTING COMMUN