IDP: An Innovative Data Placement Algorithm for Hadoop Systems

被引:2
作者
Lee, Chia-Wei [1 ]
Huang, Horng-Chyau [1 ]
Hsieh, Sun-Yuan [1 ,2 ,3 ]
机构
[1] Natl Cheng Kung Univ, Inst Med Informat, 1 Univ Rd, Tainan 701, Taiwan
[2] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 701, Taiwan
[3] Natl Cheng Kung Univ, Inst Mfg Informat & Syst, Tainan 701, Taiwan
来源
INTELLIGENT SYSTEMS AND APPLICATIONS (ICS 2014) | 2015年 / 274卷
关键词
Data Placement; Hadoop; Heterogeneous; MapReduce;
D O I
10.3233/978-1-61499-484-8-49
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a data placement strategy to deal with the imbalanced workload problem on DataNodes. Basing on computing capability of each node in a heterogeneous Hadoop cluster, the proposed strategy can balance the data that was stored in the DataNode such that the cost of data transfer time can be tremendously reduced. As a result, the Hadoop overall performance can be greatly improved. Experimental results demonstrate that the proposed data placement strategy can highly decrease the execution time and thus improves Hadoop performance in a heterogeneous cluster.
引用
收藏
页码:49 / 58
页数:10
相关论文
共 13 条
  • [1] [Anonymous], 2008, 8 USENIX S OP SYST D
  • [2] Mars: A MapReduce Framework on Graphics Processors
    He, Bingsheng
    Fang, Wenbin
    Luo, Qiong
    Govindaraju, Naga K.
    Wang, Tuyong
    [J]. PACT'08: PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2008, : 260 - 269
  • [3] Isard M., 2007, Operating Systems Review, V41, P59, DOI 10.1145/1272998.1273005
  • [4] Kavulya Soila, 2010, Proceedings 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), P94, DOI 10.1109/CCGRID.2010.112
  • [5] A Dynamic Data Placement Strategy for Hadoop in Heterogeneous Environments
    Lee, Chia-Wei
    Hsieh, Kuang-Yu
    Hsieh, Sun-Yuan
    Hsiao, Hung-Chang
    [J]. BIG DATA RESEARCH, 2014, 1 : 14 - 22
  • [6] Lee G., 2011, IEEE INT C CLOUD COM, P4
  • [7] Majors James., 2010, IEEE International Symposium on Parallel Distributed Processing, P1
  • [8] Myint J., 2011, INT J CLOUD COMPUTIN, V1, P31, DOI DOI 10.5121/ijccsa.2011.1303
  • [9] Rafique M. Mustafa, 2009, Operating Systems Review, V43, P25, DOI 10.1145/1531793.1531800
  • [10] Ranger C, 2007, INT S HIGH PERF COMP, P13