Performance Evaluation of Bloom Filter Size in Map-side and Reduce-side Bloom Joins

被引:0
作者
Al-Badarneh, Amer [1 ]
Najadat, Hassan [1 ]
Rababah, Salah [1 ]
机构
[1] Jordan Univ Sci & Technol, Irbid 22110, Jordan
来源
2017 8TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS) | 2017年
关键词
Hadoop; MapReduce; Bloom Filter; Map-side Bloom join; Reduce-side Bloom join;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Map Reduce (MP) is an efficient programming model for processing big data. However, MR has some limitations in performing the join operation. Recent researches have been made to alleviate this problem, such as Bloom join. The idea of the Bloom join lies in constructing a Bloom filter to remove redundant records before performing the join operation. The size of the constructed filter is very critical and it should be chosen in a good manner. In this paper, we evaluate the performance of the Bloom filter size for two Bloom join algorithms, Map-side Bloom join and Reduce-side Bloom join. In our methodology, we constructed multiple Bloom filters with different sizes for two static input datasets. Our experimental results show that it is not always the best solution to construct a small or a large filter size to produce a good performance, it should be constructed based on the size of the input datasets. Also, the results show that tuning the Bloom filter size causes major effects on the join performance. Furthermore, the results show that it is recommended to choose small sizes of the Bloom filter, small enough to produce neglected false positive rate, in the implementation of the two algorithms when there is a concern about the memory. On the other hand, small to medium sizes of the Bloom filter in the Reduce-side join produce smaller elapsed time compared to the Map-side join, while large sizes produce larger elapsed time.
引用
收藏
页码:165 / 170
页数:6
相关论文
共 16 条
  • [1] [Anonymous], 2010, EDBT, DOI [DOI 10.1145/1739041.1739056, 10.1145/1739041.1739056]
  • [2] [Anonymous], 2003, Internet Mathematics, DOI DOI 10.1080/15427951.2004.10129096
  • [3] [Anonymous], 2010, P ACM SIGMOD INT C M, DOI DOI 10.1145/1807167.1807273
  • [4] USING SEMI-JOINS TO SOLVE RELATIONAL QUERIES
    BERNSTEIN, PA
    CHIU, DMW
    [J]. JOURNAL OF THE ACM, 1981, 28 (01) : 25 - 40
  • [5] Dave Pinal, 2013, WHAT IS MAPREDUCE
  • [6] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [7] Juniper Networks, 2012, INTR BIG DAT INFR NE
  • [8] Katsoulis Spyridon, 2011, THESIS, P38
  • [9] Khafagy M. H., 2015, INT J SCI ENG RES, V6, P705
  • [10] Koutris P., 2011, BLOOM FILTERS DISTRI