Optimizing data placement in heterogeneous Hadoop clusters

被引:0
作者
Runqun Xiong
Junzhou Luo
Fang Dong
机构
[1] Southeast University,School of Computer Science and Engineering
来源
Cluster Computing | 2015年 / 18卷
关键词
Hadoop cluster; HDFS; Data placement; Heterogeneous; Replica;
D O I
暂无
中图分类号
学科分类号
摘要
Data placement decision of Hadoop distributed file system (HDFS) is very important for the data locality which is a primary criterion for task scheduling of MapReduce model and eventually affects the application performance. The existing HDFS’s rack-aware data placement strategy and replication scheme are work well with MapReduce framework in homogeneous Hadoop clusters, but in practice, such data placement policy can noticeably reduce MapReduce performance and may cause increasingly energy dissipation in heterogeneous environments. Besides that, HDFS employs an inflexible replica factor acquiescently for each data block, which will give rise to unnecessary waste of storage space when there is a lot of inactive data in Hadoop system. In this paper, we propose a novel data placement strategy (SLDP) for heterogeneous Hadoop clusters. SLDP adopts a heterogeneity aware algorithm to divide various nodes into several virtual storage tiers (VSTs) firstly, and then places data blocks across nodes in each VST circuitously according to the hotness of data. Furthermore, SLDP uses a hotness proportional replication to save disk space and also has an effective power control function. Experimental results on two real data-intensive applications show that SLDP is energy-efficient, space-saving and able to improve MapReduce performance in a heterogeneous Hadoop cluster significantly.
引用
收藏
页码:1465 / 1480
页数:15
相关论文
共 37 条
  • [1] Armbrust M(2010)A view of cloud computing Commun. ACM 53 50-58
  • [2] Fox A(2008)MapReduce: simplified data processing on large clusters Commun. ACM 51 107-113
  • [3] Griffith R(2013)High performance cloud computing Futur. Gener. Comput. Syst. 29 1408-1416
  • [4] Dean J(2013)Dynamic right-sizing for power-proportional data centers IEEE/ACM Trans. Netw. 21 1378-1391
  • [5] Ghemawat S(2007)The case for energy-proportional computing Computer 40 33-37
  • [6] Mauch V(2014)Towards a cost-efficient MapReduce: mitigating power peaks for Hadoop clusters Tsinghua Sci. Technol. 19 24-32
  • [7] Kunze M(1999)Data clustering: a review ACM Comput. Surv. (CSUR) 31 264-323
  • [8] Hillenbrand M(2007)Fuzzy equivalence relations and their equivalence classes Fuzzy Sets Syst. 158 1295-1313
  • [9] Lin M(2013)First result from the alpha magnetic spectrometer on the international space station: precision measurement of the positron fraction in primary cosmic rays of 0.5-350 GeV Phys. Rev. Lett. 110 1-10
  • [10] Wierman A(2003)Tutorial on maximum likelihood estimation J. Math. Psychol. 47 90-100