Improving MapReduce Performance by Balancing Skewed Loads

被引:19
作者
Fan Yuanquan [1 ]
Wu Weiguo [1 ]
Xu Yunlong [1 ]
Chen Heng [1 ]
机构
[1] Xi An Jiao Tong Univ, Dept Comp Sci & Technol, Xian 710049, Shaanxi Provinc, Peoples R China
基金
中国国家自然科学基金;
关键词
MapReduce; cloud computing; skewed loads; performance prediction; support vector machines;
D O I
10.1109/CC.2014.6911091
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
Map Reduce has emerged as a popular computing model used in datacenters to process large amount of datasets. In the map phase, hash partitioning is employed to distribute data that sharing the same key across data center-scale cluster nodes. However, we observe that this approach can lead to uneven data distribution, which can result in skewed loads among reduce tasks, thus hamper performance of Map Reduce systems. Moreover, worker nodes in Map Reduce systems may differ in computing capability due to (1) multiple generations of hardware in non-virtualized data centers, or (2) co-location of virtual machines in virtualized data centers. The heterogeneity among cluster nodes exacerbates the negative effects of uneven data distribution. To improve MapReduce performance in heterogeneous clusters, we propose a novel load balancing approach in the reduce phase. This approach consists of two components: (1) performance prediction for reducers that run on heterogeneous nodes based on support vector machines models, and (2) heterogeneity-aware partitioning (HAP), which balances skewed data for reduce tasks. We implement this approach as a plug-in in current MapReduce system. Experimental results demonstrate that our proposed approach distributes work evenly among reduce tasks, and improves MapReduce performance with little overhead.
引用
收藏
页码:85 / 108
页数:24
相关论文
共 33 条
  • [21] EXPERIENCE WITH THE ACCURACY OF SOFTWARE MAINTENANCE TASK EFFORT PREDICTION MODELS
    JORGENSEN, M
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1995, 21 (08) : 674 - 681
  • [22] Karger DavidR., 1997, P 29 ANN ACM S THEOR, P654
  • [23] KOLB L, 2012, DAT ENG ICDE 2012 IE, P618, DOI DOI 10.1109/ICDE.2012.22
  • [24] Kwon Y., 2012, SIGMOD 12, P25
  • [25] What's Inside the Cloud? An Architectural Map of the Cloud Landscape
    Lenk, Alexander
    Klems, Markus
    Nimis, Jens
    Tai, Stefan
    Sandholm, Thomas
    [J]. CLOUD: 2009 ICSE WORKSHOP ON SOFTWARE ENGINEERING CHALLENGES OF CLOUD COMPUTING, 2009, : 23 - +
  • [26] Lin J., 2009, P 7 WORKSH LARG DIST
  • [27] Matsunaga Andrea., 2010, 2010 10 IEEEACM INT, P495, DOI 10.1109/CCGRID.2010.98
  • [28] Ramakrishnan Smriti R., 2012, P ACM S CLOUD COMP S
  • [29] CloudBurst: highly sensitive read mapping with MapReduce
    Schatz, Michael C.
    [J]. BIOINFORMATICS, 2009, 25 (11) : 1363 - 1369
  • [30] Wang L., 2005, SUPPORT VECTOR MCHIN