DRAW: A New Data-gRouping-AWare Data Placement Scheme for Data Intensive Applications with Interest Locality

被引：0

作者：

Shang, Pengju ^{[1
]}

Xiao, Qiangju ^{[1
]}

Wang, Jun ^{[1
]}

机构：

[1] Univ Cent Florida, Orlando, FL 32816 USA

来源：

2012 DIGEST ASIA-PACIFIC MAGNETIC RECORDING CONFERENCE (APMRC) | 2012年

基金：

美国国家科学基金会;

关键词：

MapReduce; Hadoop; Data-intensive; Data layout;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Recent years have seen an increasing number of scientists employ data parallel computing frameworks such as MapReduce and Hadoop to run data intensive applications and conduct analysis. In these co-located compute and storage frameworks, a wise data placement scheme can significantly improve the performance. Existing data parallel frameworks, e.g. Hadoop, or Hadoop-based clouds, distribute the data using a random placement method for simplicity and load balance. However, we observe that many data intensive applications exhibit interest locality which only sweep part of a big data set. The data often accessed together result from their grouping semantics. Without taking data grouping into consideration, the random placement does not perform well and is way below the efficiency of optimal data distribution. In this paper, we develop a new Data-gRouping-AWare (DRAW) data placement scheme to address the above-mentioned problem. DRAW dynamically scrutinizes data access from system log files. It extracts optimal data groupings and re-organizes data layouts to achieve the maximum parallelism per group subjective to load balance. By experimenting two real-world MapReduce applications with different data placement schemes on a 40-node test bed, we conclude that DRAW increases the total number of local map tasks executed up to 59.8%, reduces the completion latency of the map phase up to 41.7%, and improves the overall performance by 36.4%, in comparison with Hadoop's default random placement.

引用

页数：8

共 50 条

[1] DRAW: A New Data-gRouping-AWare Data Placement Scheme for Data Intensive Applications With Interest Locality
Wang, Jun
Xiao, Qiangju
Yin, Jiangling
Shang, Pengju
IEEE TRANSACTIONS ON MAGNETICS, 2013, 49 (06) : 2514 - 2520
[2] A new data-grouping-aware dynamic data placement method that take into account jobs execute frequency for Hadoop
Wu, Jia-xuan
Zhang, Chang-sheng
Zhang, Bin
Wang, Peng
MICROPROCESSORS AND MICROSYSTEMS, 2016, 47 : 161 - 169
[3] DPPACS: A Novel Data Partitioning and Placement Aware Computation Scheduling Scheme for Data-Intensive Cloud Applications
Reddy, K. Hemant Kumar
Roy, Diptendu Sinha
COMPUTER JOURNAL, 2016, 59 (01) : 64 - 82
[4] CLUST - Grouping Aware Data Placement for Improving the Performance of Large-Scale Data Management System
Vengadeswaran, Shanmugasundaram
Balasundaram, Sadhu Ramakrishnan
PROCEEDINGS OF THE 7TH ACM IKDD CODS AND 25TH COMAD (CODS-COMAD 2020), 2020, : 1 - 9
[5] A data placement strategy for data-intensive applications in cloud
Zheng P.
Cui L.-Z.
Wang H.-Y.
Xu M.
Jisuanji Xuebao/Chinese Journal of Computers, 2010, 33 (08): : 1472 - 1480
[6] BRPS: A Big Data Placement Strategy for Data Intensive Applications
Liu, Lihui
Song, Junping
Wang, Haibo
Lv, Pin
2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 813 - 820
[7] Awan: Locality-aware Resource Manager for Geo-distributed Data-intensive Applications
Jonathan, Albert
Chandra, Abhishek
Weissman, Jon
PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E), 2016, : 32 - 41
[8] EnLoc: Data Locality-aware Energy-efficient Scheduling Scheme for Cloud Data Centers
Kaur, Kujeet
Kumar, Neeraj
Garg, Sahil
Rodrigues, Joel J. P. C.
2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2018,
[9] A novel approach for improving data locality of MapReduce applications in cloud environment through intelligent data placement
Shabeera, T. P.
Kumar, S. D. Madhu
INTERNATIONAL JOURNAL OF SERVICES TECHNOLOGY AND MANAGEMENT, 2020, 26 (04) : 323 - 340
[10] A new paradigm in data intensive computing: Stork and the data-aware schedulers
Kosar, Tevfik
Challenges of Large Applications in Distributed Environments, Proceedings, 2006, : 5 - 12

← 1 2 3 4 5 →