An efficient theta-join query processing in distributed environment

被引:2
作者
Liu, Wenjie [1 ]
Li, Zhanhuai [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp, Xian 710072, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Parallel distributed framework; Theta-join algorithm; Query optimization; Large scale data processing;
D O I
10.1016/j.jpdc.2018.07.007
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Theta-join query is very useful in many data analysis tasks, but it is not efficiently processed in distributed environment, especially in large scale data. Although there is much progress in dealing theta-join with MapReduce paradigm, the methods are either complex which require fundamental changes to MapReduce framework or only consider the overheads of load balance in the network, when data scale is large, they will make much computation cost and induce OOM (Out of Memory) errors. In this work, we propose a filter method for theta-join on the purpose of reducing the computation cost and achieving the minimum execution time in distributed environment. We consider not only the load balance in the cluster, but also the memory cost in parallel framework. We also propose a keys-based join solution for multi-way theta-join to reduce the data amount for cross product, then improve the performance of join efficiency. We implement our methods in a popular general-purpose data processing framework, Spark. The experimental results demonstrate that our methods can significantly improve the performance of theta-joins comparing with the state-of-art solutions. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:42 / 52
页数:11
相关论文
共 18 条
  • [1] [Anonymous], 2010, EDBT, DOI [DOI 10.1145/1739041.1739056, 10.1145/1739041.1739056]
  • [2] [Anonymous], 2010, P 19 ACM INT S HIGH, DOI DOI 10.1145/1851476.1851593
  • [3] [Anonymous], 2010, P USENIX WORKSH HOT
  • [4] [Anonymous], 2010, P ACM SIGMOD INT C M, DOI DOI 10.1145/1807167.1807273
  • [5] Bu YY, 2010, PROC VLDB ENDOW, V3, P285
  • [6] Chaudhuri S., 1993, Optimization of real conjunctive queries, DOI 10.1145/153850.153856
  • [7] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [8] DeWitt D. J., 1992, PRACTICAL SKEW HAND
  • [9] Lightning Fast and Space Efficient Inequality Joins
    Khayyat, Zuhair
    Lucia, William
    Singh, Meghna
    Ouzzani, Mourad
    Papotti, Paolo
    Quiane-Ruiz, Jorge-Arnulfo
    Tang, Nan
    Kalnis, Panos
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (13): : 2074 - 2085
  • [10] Kian-Lee Tan, 1991, SIGMOD Record, V20, P81, DOI 10.1145/141356.141392