Fuzzy Joins in MapReduce: Edit and Jaccard Distance

被引:0
|
作者
Kimmett, Ben [1 ]
Thomo, Alex [1 ]
Srinivasan, Venkatesh [1 ]
机构
[1] Univ Victoria, Victoria, BC V8W 2Y2, Canada
来源
2016 7TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS & APPLICATIONS (IISA) | 2016年
关键词
Fuzzy Join; Similarity Join; MapReduce; Entity Resolution; Record Linkage;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In ICDE'12, Afrati, Das Sarma, Menestrina, Parameswaran and Ullman proposed similarity join algorithms for MapReduce. In this paper, we evaluate and extend their research, testing their proposed algorithms using edit distance and Jaccard similarity. We provide details of adaptations needed to implement their algorithms based on these similarity measures. We conduct an extensive experimental study on large datasets and evaluate the algorithms across several dimensions that define the performance profile in MapReduce.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Strategic and suave processing for performing similarity joins using MapReduce
    Mahalakshmi Lakshminarayanan
    William F. Acosta
    Robert C. Green
    Vijay Devabhaktuni
    The Journal of Supercomputing, 2014, 69 : 930 - 954
  • [22] Strategic and suave processing for performing similarity joins using MapReduce
    Lakshminarayanan, Mahalakshmi
    Acosta, William F.
    Green, Robert C., II
    Devabhaktuni, Vijay
    JOURNAL OF SUPERCOMPUTING, 2014, 69 (02) : 930 - 954
  • [23] Large-Scale Similarity Join with Edit-Distance Constraints
    Lin, Chen
    Yu, Haiyang
    Weng, Wei
    He, Xianmang
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, PT II, 2014, 8422 : 328 - 342
  • [24] Handling data skew in joins based on cluster cost partitioning for MapReduce
    Wang, Yang
    Zhong, Yong
    Ma, Qingshan
    Yang, Guanci
    MULTIAGENT AND GRID SYSTEMS, 2018, 14 (01) : 103 - 123
  • [25] Parallel Computation of k-Nearest Neighbor Joins Using MapReduce
    Kim, Wooyeol
    Kim, Younghoon
    Shim, Kyuseok
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 696 - 705
  • [26] SEJ: An Even Approach to Multiway Theta-Joins using MapReduce
    Zhang, Changchun
    Li, Jing
    Wu, Lei
    Lin, Meiyan
    Liu, Weiqing
    SECOND INTERNATIONAL CONFERENCE ON CLOUD AND GREEN COMPUTING / SECOND INTERNATIONAL CONFERENCE ON SOCIAL COMPUTING AND ITS APPLICATIONS (CGC/SCA 2012), 2012, : 73 - 80
  • [27] Solutions for Processing K Nearest Neighbor Joins for Massive Data on MapReduce
    Song, Ge
    Rochas, Justine
    Huet, Fabrice
    Magoules, Frederic
    23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015), 2015, : 279 - 287
  • [28] Auto-FuzzyJoin: Auto-Program Fuzzy Similarity Joins Without Labeled Examples
    Li, Peng
    Cheng, Xiang
    Chu, Xu
    He, Yeye
    Chaudhuri, Surajit
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 1064 - 1076
  • [29] Genetic-Fuzzy Mining with MapReduce
    Hong, Tzung-Pei
    Liu, Yu-Yang
    Wu, Min-Thai
    Chen, Chun-Hao
    Wang, Leon Shyue-Liang
    2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 3294 - 3298
  • [30] K Nearest Neighbour Joins for Big Data on MapReduce: A Theoretical and Experimental Analysis
    Song, Ge
    Rochas, Justine
    El Beze, Lea
    Huet, Fabrice
    Magoules, Frederic
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (09) : 2376 - 2392