Optimizing Distributed Joins with Bloom Filters Using MapReduce

被引:0
作者
Zhang, Changchun [1 ]
Wu, Lei [1 ]
Li, Jing [1 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei, Peoples R China
来源
COMPUTER APPLICATIONS FOR GRAPHICS, GRID COMPUTING, AND INDUSTRIAL ENVIRONMENT | 2012年 / 351卷
关键词
Bloom Filter; MapReduce; Query Optimization;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The MapReduce framework is increasingly being used to process and analyze large-scale datasets over large clusters. Join operation using MapReduce is an attractive point to which researchers have been paying attention in recent years. The distributed join based on the bloom filter has been proved to be a successful technique to improve the efficiency. However, the full potential of the bloom filter has not been fully exploited, especially in the MapReduce environment. In this paper, we present several strategies to build the bloom filter for the large dataset using MapReduce, compare some bloom-join algorithms and point out how to improve the performance of two-way and multi-way joins. The experiments we conduct show that our method is feasible and effective.
引用
收藏
页码:88 / 95
页数:8
相关论文
共 50 条
  • [31] Bloom Filters and Compact Hash Codes for Efficient and Distributed Image Retrieval
    Salvi, Andrea
    Ercoli, Simone
    Bertini, Marco
    Del Bimbo, Alberto
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2016, : 515 - 520
  • [32] DISTRIBUTED LOG ANALYSIS ON THE CLOUD USING MapReduce
    Aydin, Galip
    Hallac, Ibrahim R.
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2016, 23 (04): : 1011 - 1016
  • [33] Optimizing MapReduce Scheduling using Datanode Load Prediction
    Patel, Dharmesh
    Hasan, Mosin
    Sharma, Kirti
    2015 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, SIGNALS, COMMUNICATION AND OPTIMIZATION (EESCO), 2015,
  • [34] Performance Evaluation of Bloom Filter Size in Map-side and Reduce-side Bloom Joins
    Al-Badarneh, Amer
    Najadat, Hassan
    Rababah, Salah
    2017 8TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2017, : 165 - 170
  • [35] Fast URL lookup using parallel bloom filters
    Zhou, Zhou
    Fu, Wen-Liang
    Song, Tian
    Liu, Qing-Yun
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2015, 43 (09): : 1833 - 1840
  • [36] Optimizing Cloud MapReduce for Processing Stream Data using Pipelining
    Karve, Rutvik
    Dahiphale, Devendra
    Chhajer, Amit
    UKSIM FIFTH EUROPEAN MODELLING SYMPOSIUM ON COMPUTER MODELLING AND SIMULATION (EMS 2011), 2011, : 344 - 349
  • [37] Towards an Efficient and Distributed DBSCAN Algorithm Using MapReduce
    Coelho da Silva, Ticiana L.
    Araujo Neto, Antonio C.
    Magalhes, Regis Pires
    de Farias, Victor A. E.
    de Macedo, Jose A. F.
    Machado, Javam C.
    ENTERPRISE INFORMATION SYSTEMS, ICEIS 2014, 2015, 227 : 75 - 90
  • [38] GPU accelerated information retrieval using Bloom filters
    Iacob, Alexandru
    Itu, Lucian
    Sasu, Lucian
    Moldoveanu, Florin
    Suciu, Constantin
    2015 19TH INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2015, : 872 - 876
  • [39] An Experimental Survey of MapReduce-Based Similarity Joins
    Silva, Yasin N.
    Reed, Jason
    Brown, Kyle
    Wadsworth, Adelbert
    Rong, Chuitian
    SIMILARITY SEARCH AND APPLICATIONS, SISAP 2016, 2016, 9939 : 181 - 195
  • [40] Distributed discovery of frequent subgraphs of a network using MapReduce
    Shahrivari, Saeed
    Jalili, Saeed
    COMPUTING, 2015, 97 (11) : 1101 - 1120