Application of Filters to Multiway Joins in MapReduce

被引:3
作者
Lee, Taewhi [1 ]
Im, Dong-Hyuk [2 ]
Kim, Hangkyu [3 ]
Kim, Hyoung-Joo [1 ]
机构
[1] Seoul Natl Univ, Sch Comp Sci & Engn, Seoul 151744, South Korea
[2] Hoseo Univ, Dept Comp & Informat Engn, Asan 336795, Chungnam, South Korea
[3] Samsung Elect, Software Ctr, Suwon 443370, Gyeonggi, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
10.1155/2014/249418
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Joining multiple datasets in MapReduce may amplify the disk and network overheads because intermediate join results have to be written to the underlying distributed file system, or map output records have to be replicated multiple times. This paper proposes a method for applying filters based on the processing order of input datasets, which is appropriate for the two types of multiway joins: common attribute joins and distinct attribute joins. The number of redundant records filtered depends on the processing order. In common attribute joins, the input records do not need to be replicated, so a set of filters is created, which are applied in turn. In distinct attribute joins, the input records have to be replicated, so multiple sets of filters need to be created, which depend on the number of join attributes. The experimental results showed that our approach outperformed a cascade of two-way joins and basic multiway joins in cases where small portions of input datasets were joined.
引用
收藏
页数:11
相关论文
共 18 条
  • [1] [Anonymous], 2010, EDBT, DOI [DOI 10.1145/1739041.1739056, 10.1145/1739041.1739056]
  • [2] [Anonymous], 2010, P ACM SIGMOD INT C M, DOI DOI 10.1145/1807167.1807273
  • [3] SPACE/TIME TRADE/OFFS IN HASH CODING WITH ALLOWABLE ERRORS
    BLOOM, BH
    [J]. COMMUNICATIONS OF THE ACM, 1970, 13 (07) : 422 - &
  • [4] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [5] MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters
    Jiang, Dawei
    Tung, Anthony K. H.
    Chen, Gang
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (09) : 1299 - 1311
  • [6] Kemper A, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P30
  • [7] Koutris P., 2011, BLOOM FILTERS DISTRI
  • [8] Using slice join for efficient evaluation of multi-way joins
    Lawrence, Ramon
    [J]. DATA & KNOWLEDGE ENGINEERING, 2008, 67 (01) : 118 - 139
  • [9] Parallel Data Processing with MapReduce: A Survey
    Lee, Kyong-Ha
    Lee, Yoon-Joon
    Choi, Hyunsik
    Chung, Yon Dohn
    Moon, Bongki
    [J]. SIGMOD RECORD, 2011, 40 (04) : 11 - 20
  • [10] Lee T., 2013, INFORM AN INT INTERD, V16, P5869