Fuzzy Joins in MapReduce: Edit and Jaccard Distance

被引:0
|
作者
Kimmett, Ben [1 ]
Thomo, Alex [1 ]
Srinivasan, Venkatesh [1 ]
机构
[1] Univ Victoria, Victoria, BC V8W 2Y2, Canada
来源
2016 7TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS & APPLICATIONS (IISA) | 2016年
关键词
Fuzzy Join; Similarity Join; MapReduce; Entity Resolution; Record Linkage;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In ICDE'12, Afrati, Das Sarma, Menestrina, Parameswaran and Ullman proposed similarity join algorithms for MapReduce. In this paper, we evaluate and extend their research, testing their proposed algorithms using edit distance and Jaccard similarity. We provide details of adaptations needed to implement their algorithms based on these similarity measures. We conduct an extensive experimental study on large datasets and evaluate the algorithms across several dimensions that define the performance profile in MapReduce.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Optimization for Large-Scale Fuzzy Joins Using Fuzzy Filters in MapReduce
    Thi-To-Quyen Tran
    Thuong-Cang Phan
    Laurent, Anne
    D'orazio, Laurent
    2020 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2020,
  • [2] Approximating Edit Distance in Truly Subquadratic Time: Quantum and MapReduce
    Boroujeni, Mahdi
    Ehsani, Soheil
    Ghodsi, Mohammad
    Hajiaghayi, Mohammadtaghi
    Seddighin, Saeed
    JOURNAL OF THE ACM, 2021, 68 (03)
  • [3] Secure Joins with MapReduce
    Bultel, Xavier
    Ciucanu, Radu
    Giraud, Matthieu
    Lafourcade, Pascal
    Ye, Lihua
    FOUNDATIONS AND PRACTICE OF SECURITY, FPS 2018, 2019, 11358 : 78 - 94
  • [4] Improving Hamming distance-based fuzzy join in MapReduce using Bloom Filters
    Thi-To-Quyen Tran
    Thuong-Cang Phan
    Laurent, Anne
    D'Orazio, Laurent
    2018 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2018,
  • [5] On Spatial Joins in MapReduce
    Sabek, Ibrahim
    Mokbel, Mohamed F.
    25TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2017), 2017,
  • [6] Fast and scalable vector similarity joins with MapReduce
    Byoungju Yang
    Hyun Joon Kim
    Junho Shim
    Dongjoo Lee
    Sang-goo Lee
    Journal of Intelligent Information Systems, 2016, 46 : 473 - 497
  • [7] Fast and scalable vector similarity joins with MapReduce
    Yang, Byoungju
    Kim, Hyun Joon
    Shim, Junho
    Lee, Dongjoo
    Lee, Sang-goo
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2016, 46 (03) : 473 - 497
  • [8] Metric Similarity Joins Using MapReduce
    Chen, Gang
    Yang, Keyu
    Chen, Lu
    Gao, Yunjun
    Zheng, Baihua
    Chen, Chun
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (03) : 656 - 669
  • [9] Practising Scalable Graph Similarity Joins in MapReduce
    Chen, Yifan
    Zhao, Xiang
    Ge, Bin
    Xiao, Chuan
    Chi, Chi-Hung
    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 112 - 119
  • [10] SharesSkew: An algorithm to handle skew for joins in MapReduce
    Afrati, Foto N.
    Stasinopoulos, Nikos
    Ullman, Jeffrey D.
    Vassilakopoulos, Angelos
    INFORMATION SYSTEMS, 2018, 77 : 129 - 150