Parallel labeling of massive XML data with MapReduce

被引：0

作者：

Hyebong Choi

Kyong-Ha Lee

Yoon-Joon Lee

机构：

[1] KAIST,Department of Computer Science

[2] ETRI,Intelligent Convergence Media Research Department, Broadcasting & Telecommunications Media Research Laboratory

来源：

The Journal of Supercomputing | 2014年 / 67卷

关键词：

Parallel computing; XML; Tree labeling algorithm; MapReduce;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The volume of XML data has become enormous and still grows very quickly as many data have been typed in XML by virtue of its simplicity and extensibility. While a tree labeling algorithm has a crucial role in XML query processing, conventional algorithms are all sequential so that they fail to label a large volume of XML data in a timely manner. To address this issue, we devise parallel tree labeling algorithms for massive XML data. Specifically, we focus on how to efficiently label a single large XML file in parallel. We first propose parallel versions of two prominent tree labeling schemes based on the MapReduce framework. We then present techniques for runtime workload balancing and data repartition to solve performance issues caused by data skewness and MapReduce’s inherited limitation. Through extensive experiments with synthetic and real-world datasets on 15 nodes, we show that our parallel labeling algorithms are up to 17 times faster than conventional algorithms, providing strong durability against data skewness.

引用

页码：408 / 437

页数：29

共 50 条

[1] Parallel labeling of massive XML data with MapReduce
Choi, Hyebong
Lee, Kyong-Ha
Lee, Yoon-Joon
JOURNAL OF SUPERCOMPUTING, 2014, 67 (02): : 408 - 437
[2] Parallel Prime Number Labeling of Large XML Data Using MapReduce
Ahn, Jinhyun
Im, Dong-Hyuk
Lee, Taewhi
Kim, Hong-Gee
2016 6TH INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS 2016), 2016, : 176 - 177
[3] A dynamic and parallel approach for repetitive prime labeling of XML with MapReduce
Jinhyun Ahn
Dong-Hyuk Im
Taewhi Lee
Hong-Gee Kim
The Journal of Supercomputing, 2017, 73 : 810 - 836
[4] A dynamic and parallel approach for repetitive prime labeling of XML with MapReduce
Ahn, Jinhyun
Im, Dong-Hyuk
Lee, Taewhi
Kim, Hong-Gee
JOURNAL OF SUPERCOMPUTING, 2017, 73 (02): : 810 - 836
[5] Parallel Processing of Massive EEG Data with MapReduce
Wang, Lizhe
Chen, Dan
Ranjan, Rajiv
Khan, Samee U.
Kolodziej, Joanna
Wang, Jun
PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012), 2012, : 164 - 171
[6] Parallel Accessing Massive NetCDF Data Based on MapReduce
Zhao, Hui
Ai, SiYun
Lv, ZhenHua
Li, Bo
WEB INFORMATION SYSTEMS AND MINING, 2010, 6318 : 425 - +
[7] Parallel Map Matching on Massive Vehicle GPS Data Using MapReduce
Huang, Jian
Qiao, Shaoqing
Yu, Haitao
Qie, Jinhui
Liu, Chunwei
2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1498 - 1503
[8] Analysis of Massive Industrial Data using MapReduce Framework for Parallel Processing
Aly, Mohab
Yacout, Soumaya
Shaban, Yasser
2017 ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 2017,
[9] Parallel similarity joins on massive high-dimensional data using MapReduce
Ma, Youzhong
Meng, Xiaofeng
Wang, Shaoya
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (01): : 166 - 183
[10] Labeling Instances in Evolving Data Streams with MapReduce
Haque, Ahsanul
Parker, Brandon
Khan, Latifur
2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, : 387 - 394

← 1 2 3 4 5 →