Join processing with threshold-based filtering in MapReduce

被引:3
|
作者
Lee, Taewhi [1 ]
Bae, Hye-Chan [2 ]
Kim, Hyoung-Joo [3 ]
机构
[1] Elect & Telecommun Res Inst, BigData Software Platform Res Dept, Taejon 305700, South Korea
[2] Samsung Elect Co Ltd, Media Solut Ctr, Suwon 443742, Gyeonggi Do, South Korea
[3] Seoul Natl Univ, Dept Comp Sci & Engn, Seoul 151744, South Korea
来源
JOURNAL OF SUPERCOMPUTING | 2014年 / 69卷 / 02期
基金
新加坡国家研究基金会;
关键词
Join processing; Threshold-based filtering; MapReduce; Hadoop; DISTRIBUTED JOINS;
D O I
10.1007/s11227-014-1179-9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data analytics, in particular those involving heterogeneous data, often require join operations on datasets collected from different sources. MapReduce, one of the most popular frameworks for large-scale data processing, is not suited for joining multiple datasets. This is because MapReduce often produces a large number of redundant intermediate results, irrespective of the size of the joined records. Although several existing approaches attempt to reduce the number of such redundant results using Bloom filters, they may be inefficient if large portions of records are joined or the number of distinct keys is large. To alleviate this problem, we propose a join processing method with threshold-based filtering in MapReduce, called TMFR-Join, which is an abbreviation for "Threshold-based Map-Filter-Reduce Join". TMFR-Join applies filters according to their performance, which is estimated in terms of false-positive rates. It also provides a general framework for exploiting various filtering techniques that support certain desired operations. The experimental results indicate that the performance of TMFR-Join is close to that of the better of existing join processing techniques, both with and without filters.
引用
收藏
页码:793 / 813
页数:21
相关论文
共 50 条
  • [1] Join processing with threshold-based filtering in MapReduce
    Taewhi Lee
    Hye-Chan Bae
    Hyoung-Joo Kim
    The Journal of Supercomputing, 2014, 69 : 793 - 813
  • [2] An Efficient MapReduce-Based Parallel Processing Framework for User-Based Collaborative Filtering
    Jeong, Hanjo
    Cha, Kyung Jin
    SYMMETRY-BASEL, 2019, 11 (06):
  • [3] Optimizations for filter-based join algorithms in MapReduce
    Rababa, Salahaldeen
    Al-Badarneh, Amer
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (05) : 8963 - 8980
  • [4] SigMR: MapReduce-based SPARQL query processing by signature encoding and multi-way join
    Ahn, Jinhyun
    Im, Dong-Hyuk
    Kim, Hong-Gee
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (10): : 3695 - 3725
  • [5] SigMR: MapReduce-based SPARQL query processing by signature encoding and multi-way join
    Jinhyun Ahn
    Dong-Hyuk Im
    Hong-Gee Kim
    The Journal of Supercomputing, 2015, 71 : 3695 - 3725
  • [6] A Boundary Filtering Based Spatial Join Query Processing Optimization Algorithm
    Qiao, Baiyou
    Zhu, Junhai
    Shen, Muchuan
    Chen, Yang
    2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 1764 - 1769
  • [7] An Efficient Two-Table Join Query Processing Based on Extended Bloom Filter in MapReduce
    Wang, Junlu
    Pang, Jun
    Li, Xiaoyan
    Han, Baishuo
    Huang, Lei
    Ding, Linlin
    WEB-AGE INFORMATION MANAGEMENT, 2016, 9998 : 249 - 258
  • [8] Efficient Snapshot KNN Join Processing for Large Data Using MapReduce
    Hu, Yupeng
    Yang, Chong
    Ji, Cun
    Xu, Yang
    Li, Xueqing
    2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, : 713 - 720
  • [9] A Density-Aware Similarity Join Query Processing Algorithm on MapReduce
    Jang, Miyoung
    Song, Youngho
    Chang, Jae-Woo
    ADVANCED MULTIMEDIA AND UBIQUITOUS ENGINEERING: FUTURETECH & MUE, 2016, 393 : 469 - 475
  • [10] User Based Collaborative Filtering Using Bloom Filter with MapReduce
    Shinde, Anita
    Savant, Ila
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ICT FOR SUSTAINABLE DEVELOPMENT, ICT4SD 2015, VOL 1, 2016, 408 : 115 - 123