Novel data-placement scheme for improving the data locality of Hadoop in heterogeneous environments

被引:9
作者
Bae, Minho [1 ]
Yeo, Sangho [1 ]
Park, Gyudong [2 ]
Oh, Sangyoon [1 ]
机构
[1] Ajou Univ, Comp Engn, Suwon, South Korea
[2] Agcy Def Dev, Seoul, South Korea
关键词
data locality; data placement; Hadoop MapReduce; heterogeneous environment; replication; STRATEGY;
D O I
10.1002/cpe.5752
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
To address the challenging needs of high-performance big data processing, parallel-distributed frameworks such as Hadoop are being utilized extensively. However, in heterogeneous environments, the performance of Hadoop clusters is below par. This is primarily because the blocks of the clusters are allocated equally to all nodes without regard to differences in the capability of individual nodes. This results in reduced data locality. Thus, a new data-placement scheme that enhances data locality is required for Hadoop in heterogeneous environments. This article proposes a new data placement scheme that preserves the same degree of data locality in heterogeneous environments as that of the standard Hadoop, with only a small amount of replicated data. In the proposed scheme, only those blocks with the highest probability of being accessed remotely are selected and replicated. The results of experiments conducted indicate that the proposed scheme incurs only a 20% disk space overhead and has virtually the same data locality ratio as the standard Hadoop, which has a replication factor of three and 200% disk space overhead.
引用
收藏
页数:11
相关论文
共 23 条
  • [1] [Anonymous], 2010, AS PAC POW EN ENG C
  • [2] Bo Wang, 2015, 2015 IEEE Conference on Computer Communications (INFOCOM). Proceedings, P1328, DOI 10.1109/INFOCOM.2015.7218509
  • [3] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [4] Migration-Aware Genetic Optimization for MapReduce Scheduling and Replica Placement in Hadoop
    Guerrero, Carlos
    Lera, Isaac
    Juiz, Carlos
    [J]. JOURNAL OF GRID COMPUTING, 2018, 16 (02) : 265 - 284
  • [5] ADAPT: Availability-aware MapReduce Data Placement for Non-Dedicated Distributed Computing
    Jin, Hui
    Yang, Xi
    Sun, Xian-He
    Raicu, Ioan
    [J]. 2012 IEEE 32ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2012, : 516 - 525
  • [6] Jin L, 2014, CLOUD COMPUTING AND DIGITAL MEDIA: FUNDAMENTALS, TECHNIQUES, AND APPLICATIONS, P147
  • [7] IDP: An Innovative Data Placement Algorithm for Hadoop Systems
    Lee, Chia-Wei
    Huang, Horng-Chyau
    Hsieh, Sun-Yuan
    [J]. INTELLIGENT SYSTEMS AND APPLICATIONS (ICS 2014), 2015, 274 : 49 - 58
  • [8] A Dynamic Data Placement Strategy for Hadoop in Heterogeneous Environments
    Lee, Chia-Wei
    Hsieh, Kuang-Yu
    Hsieh, Sun-Yuan
    Hsiao, Hung-Chang
    [J]. BIG DATA RESEARCH, 2014, 1 : 14 - 22
  • [9] Efficient vCore Based Container Deployment Algorithm for Improving Heterogeneous Hadoop YARN Performance
    Lee, SooKyung
    Bae, Min-Ho
    Eum, Jun-Ho
    Oh, Sangyoon
    [J]. INFORMATION SCIENCE AND APPLICATIONS 2017, ICISA 2017, 2017, 424 : 191 - 201
  • [10] Liu KY, 2018, C LOCAL COMPUT NETW, P142, DOI 10.1109/LCN.2018.8638050