Parallel labeling of massive XML data with MapReduce

被引:0
|
作者
Hyebong Choi
Kyong-Ha Lee
Yoon-Joon Lee
机构
[1] KAIST,Department of Computer Science
[2] ETRI,Intelligent Convergence Media Research Department, Broadcasting & Telecommunications Media Research Laboratory
来源
The Journal of Supercomputing | 2014年 / 67卷
关键词
Parallel computing; XML; Tree labeling algorithm; MapReduce;
D O I
暂无
中图分类号
学科分类号
摘要
The volume of XML data has become enormous and still grows very quickly as many data have been typed in XML by virtue of its simplicity and extensibility. While a tree labeling algorithm has a crucial role in XML query processing, conventional algorithms are all sequential so that they fail to label a large volume of XML data in a timely manner. To address this issue, we devise parallel tree labeling algorithms for massive XML data. Specifically, we focus on how to efficiently label a single large XML file in parallel. We first propose parallel versions of two prominent tree labeling schemes based on the MapReduce framework. We then present techniques for runtime workload balancing and data repartition to solve performance issues caused by data skewness and MapReduce’s inherited limitation. Through extensive experiments with synthetic and real-world datasets on 15 nodes, we show that our parallel labeling algorithms are up to 17 times faster than conventional algorithms, providing strong durability against data skewness.
引用
收藏
页码:408 / 437
页数:29
相关论文
共 50 条
  • [1] Parallel labeling of massive XML data with MapReduce
    Choi, Hyebong
    Lee, Kyong-Ha
    Lee, Yoon-Joon
    JOURNAL OF SUPERCOMPUTING, 2014, 67 (02): : 408 - 437
  • [2] Parallel Prime Number Labeling of Large XML Data Using MapReduce
    Ahn, Jinhyun
    Im, Dong-Hyuk
    Lee, Taewhi
    Kim, Hong-Gee
    2016 6TH INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS 2016), 2016, : 176 - 177
  • [3] A dynamic and parallel approach for repetitive prime labeling of XML with MapReduce
    Jinhyun Ahn
    Dong-Hyuk Im
    Taewhi Lee
    Hong-Gee Kim
    The Journal of Supercomputing, 2017, 73 : 810 - 836
  • [4] A dynamic and parallel approach for repetitive prime labeling of XML with MapReduce
    Ahn, Jinhyun
    Im, Dong-Hyuk
    Lee, Taewhi
    Kim, Hong-Gee
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (02): : 810 - 836
  • [5] Parallel Processing of Massive EEG Data with MapReduce
    Wang, Lizhe
    Chen, Dan
    Ranjan, Rajiv
    Khan, Samee U.
    Kolodziej, Joanna
    Wang, Jun
    PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012), 2012, : 164 - 171
  • [6] Parallel Accessing Massive NetCDF Data Based on MapReduce
    Zhao, Hui
    Ai, SiYun
    Lv, ZhenHua
    Li, Bo
    WEB INFORMATION SYSTEMS AND MINING, 2010, 6318 : 425 - +
  • [7] Parallel Map Matching on Massive Vehicle GPS Data Using MapReduce
    Huang, Jian
    Qiao, Shaoqing
    Yu, Haitao
    Qie, Jinhui
    Liu, Chunwei
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1498 - 1503
  • [8] Analysis of Massive Industrial Data using MapReduce Framework for Parallel Processing
    Aly, Mohab
    Yacout, Soumaya
    Shaban, Yasser
    2017 ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 2017,
  • [9] Parallel similarity joins on massive high-dimensional data using MapReduce
    Ma, Youzhong
    Meng, Xiaofeng
    Wang, Shaoya
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (01): : 166 - 183
  • [10] Labeling Instances in Evolving Data Streams with MapReduce
    Haque, Ahsanul
    Parker, Brandon
    Khan, Latifur
    2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, : 387 - 394