DRAW: A New Data-gRouping-AWare Data Placement Scheme for Data Intensive Applications with Interest Locality

被引:0
|
作者
Shang, Pengju [1 ]
Xiao, Qiangju [1 ]
Wang, Jun [1 ]
机构
[1] Univ Cent Florida, Orlando, FL 32816 USA
来源
2012 DIGEST ASIA-PACIFIC MAGNETIC RECORDING CONFERENCE (APMRC) | 2012年
基金
美国国家科学基金会;
关键词
MapReduce; Hadoop; Data-intensive; Data layout;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recent years have seen an increasing number of scientists employ data parallel computing frameworks such as MapReduce and Hadoop to run data intensive applications and conduct analysis. In these co-located compute and storage frameworks, a wise data placement scheme can significantly improve the performance. Existing data parallel frameworks, e.g. Hadoop, or Hadoop-based clouds, distribute the data using a random placement method for simplicity and load balance. However, we observe that many data intensive applications exhibit interest locality which only sweep part of a big data set. The data often accessed together result from their grouping semantics. Without taking data grouping into consideration, the random placement does not perform well and is way below the efficiency of optimal data distribution. In this paper, we develop a new Data-gRouping-AWare (DRAW) data placement scheme to address the above-mentioned problem. DRAW dynamically scrutinizes data access from system log files. It extracts optimal data groupings and re-organizes data layouts to achieve the maximum parallelism per group subjective to load balance. By experimenting two real-world MapReduce applications with different data placement schemes on a 40-node test bed, we conclude that DRAW increases the total number of local map tasks executed up to 59.8%, reduces the completion latency of the map phase up to 41.7%, and improves the overall performance by 36.4%, in comparison with Hadoop's default random placement.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] A data-locality-aware task scheduler for distributed social graph queries
    Jin, Jiahui
    Luo, Junzhou
    Du, Mingyang
    Dang, Yongcheng
    Li, Feng
    Zhang, Jinghui
    Song, Aibo
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 93 : 1010 - 1022
  • [42] A locality-aware shuffle optimization on fat-tree data centers
    Wang, Jihe
    Wang, Danghui
    Qiu, Meikang
    Chen, Yao
    Guo, Bing
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 89 : 31 - 43
  • [43] New Data Placement Strategy in the HADOOP Framework
    Elomari, Akram
    Hassouni, Larbi
    Maizate, Abderrahim
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (07) : 676 - 684
  • [44] Leveraging Data Intensive Applications on a Pervasive Computing Platform: the case of MapReduce
    Steffenel, Luiz Angelo
    Pinheiro, Manuele Kirch
    6TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT-2015), THE 5TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT-2015), 2015, 52 : 1034 - 1039
  • [45] An Improved GPU MapReduce Framework for Data Intensive Applications
    Nitu, Razvan
    Apostol, Elena
    Cristea, Valentin
    2014 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2014, : 355 - 362
  • [46] Monarch: A Durable Polymorphic Memory for Data Intensive Applications
    Prasad, Ananth Krishna
    Bojnordi, Mahdi Nazm
    IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (02) : 535 - 547
  • [47] SLOPE: Structural Locality-Aware Programming Model for Composing Array Data Analysis
    Dong, Bin
    Wu, Kesheng
    Byna, Suren
    Tang, Houjun
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2019, 2019, 11501 : 61 - 80
  • [48] An efficient deadline constrained and data locality aware dynamic scheduling framework for multitenancy clouds
    Ru, Jia
    Yang, Yun
    Grundy, John
    Keung, Jacky
    Hao, Li
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (05)
  • [49] Sharing-Aware InterCloud Scheduler for Data-Intensive Jobs
    Mehdi, Nawfal A.
    Holmes, Bryn
    Mamat, Ali
    Subramaniam, Shamala K.
    2012 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGIES, APPLICATIONS AND MANAGEMENT (ICCCTAM), 2012, : 22 - 26
  • [50] Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences
    Liangxiu Han
    Hwee Yong Ong
    Cluster Computing, 2015, 18 : 403 - 418