Optimizing Distributed Joins with Bloom Filters Using MapReduce

被引:0
|
作者
Zhang, Changchun [1 ]
Wu, Lei [1 ]
Li, Jing [1 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei, Peoples R China
来源
COMPUTER APPLICATIONS FOR GRAPHICS, GRID COMPUTING, AND INDUSTRIAL ENVIRONMENT | 2012年 / 351卷
关键词
Bloom Filter; MapReduce; Query Optimization;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The MapReduce framework is increasingly being used to process and analyze large-scale datasets over large clusters. Join operation using MapReduce is an attractive point to which researchers have been paying attention in recent years. The distributed join based on the bloom filter has been proved to be a successful technique to improve the efficiency. However, the full potential of the bloom filter has not been fully exploited, especially in the MapReduce environment. In this paper, we present several strategies to build the bloom filter for the large dataset using MapReduce, compare some bloom-join algorithms and point out how to improve the performance of two-way and multi-way joins. The experiments we conduct show that our method is feasible and effective.
引用
收藏
页码:88 / 95
页数:8
相关论文
共 50 条
  • [21] Path similarity evaluation using Bloom filters
    Donnet, Benoit
    Gueye, Bamba
    Kaafar, Mohamed All
    COMPUTER NETWORKS, 2012, 56 (02) : 858 - 869
  • [22] Fuzzy Joins in MapReduce: Edit and Jaccard Distance
    Kimmett, Ben
    Thomo, Alex
    Srinivasan, Venkatesh
    2016 7TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS & APPLICATIONS (IISA), 2016,
  • [23] Efficient Large Outer Joins over MapReduce
    Cheng, Long
    Kotoulas, Spyros
    EURO-PAR 2016: PARALLEL PROCESSING, 2016, 9833 : 334 - 346
  • [24] kNN-DP: Handling Data Skewness in kNN Joins Using MapReduce
    Zhao, Xujun
    Zhang, Jifu
    Qin, Xiao
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (03) : 600 - 613
  • [25] Distributed Metadata Management based on Hierarchical Bloom Filters in Data Grid
    Chen, Shihua
    Huang, Xiaomeng
    Xu, Pengzhi
    Zheng, Weimin
    FOURTH CHINAGRID ANNUAL CONFERENCE, PROCEEDINGS, 2009, : 95 - 101
  • [26] Fast and scalable vector similarity joins with MapReduce
    Yang, Byoungju
    Kim, Hyun Joon
    Shim, Junho
    Lee, Dongjoo
    Lee, Sang-goo
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2016, 46 (03) : 473 - 497
  • [27] ICMP based IP traceback with negligible overhead for highly distributed reflector attack using bloom filters
    Saurabh, S.
    Sairam, A. S.
    COMPUTER COMMUNICATIONS, 2014, 42 : 60 - 69
  • [28] Bloom Filters and Compact Hash Codes for Efficient and Distributed Image Retrieval
    Salvi, Andrea
    Ercoli, Simone
    Bertini, Marco
    Del Bimbo, Alberto
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2016, : 515 - 520
  • [29] Parallel similarity joins on massive high-dimensional data using MapReduce
    Ma, Youzhong
    Meng, Xiaofeng
    Wang, Shaoya
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (01) : 166 - 183
  • [30] Optimizing MapReduce Scheduling using Datanode Load Prediction
    Patel, Dharmesh
    Hasan, Mosin
    Sharma, Kirti
    2015 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, SIGNALS, COMMUNICATION AND OPTIMIZATION (EESCO), 2015,