Minimizing Skew in MapReduce Applications using Node Clustering in Heterogeneous Environment

被引：3

作者：

Nawale, Vishal Ankush ^{[1
]}

Deshpande, Priya ^{[1
]}

机构：

[1] MIT Coll Engn, Informat Technol, Pune, Maharashtra, India

来源：

2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN) | 2015年

关键词：

MapReduce; Hadoop; data skew; computational power;

D O I：

10.1109/CICN.2015.35

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present an automatic skew minimization approach defined for MapReduce programs and present proposed system that implements this approach as a replacement for an existing MapReduce implementation. The proposed system addresses these challenges and works as follows: From intermediate output from map tasks data skew present in records to solve this problem we create two sets of node mainly of high and low computational power nodes and assign the skewed records to the high computational power nodes and remaining to the low computational power nodes to process further reduce task. We implement proposed system as an extension to Hadoop and evaluate its effectiveness using real applications. The results show that proposed system can reduce job runtime in the presence of skew and adds little to no overhead in the absence of skew in heterogeneous environment.

引用

页码：136 / 139

页数：4

共 10 条

[1] Chen Q., 2014, IEEE T COMPUTERS TC, V63
[2] Chen Q., 2014, PARALLEL DISTRIBUTED, VPP
[3] Mapreduce: Simplified data processing on large clusters
Dean, Jeffrey
Ghemawat, Sanjay
[J]. COMMUNICATIONS OF THE ACM, 2008, 51 (01) : 107 - 113
[4] Gufler, 2012, P ICDE 12
[5] Kwon Y., 2010, P 1 SOCC C JUN
[6] Kwon Y., 2011, P OP CIRR SUMM
[7] Kwon Y., 2012, P ACM SIGMOD INT C M
[8] Lin Jimmy, 2009, 7 WORKSH LARG SCAL D
[9] White Tom., 2011, HADOOP DEFINITIVE GU, V2
[10] Yan Wei, PERF COMP COMM C IPC, P1

← 1 →