Fuzzy Joins in MapReduce: Edit and Jaccard Distance

被引:0
|
作者
Kimmett, Ben [1 ]
Thomo, Alex [1 ]
Srinivasan, Venkatesh [1 ]
机构
[1] Univ Victoria, Victoria, BC V8W 2Y2, Canada
来源
2016 7TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS & APPLICATIONS (IISA) | 2016年
关键词
Fuzzy Join; Similarity Join; MapReduce; Entity Resolution; Record Linkage;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In ICDE'12, Afrati, Das Sarma, Menestrina, Parameswaran and Ullman proposed similarity join algorithms for MapReduce. In this paper, we evaluate and extend their research, testing their proposed algorithms using edit distance and Jaccard similarity. We provide details of adaptations needed to implement their algorithms based on these similarity measures. We conduct an extensive experimental study on large datasets and evaluate the algorithms across several dimensions that define the performance profile in MapReduce.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Fuzzy Associative Classification Algorithm Based on MapReduce Framework
    Bhukya, Raghuram
    Gyani, Jayadev
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2015, : 357 - 360
  • [42] Understanding Cloud Data Using Approximate String Matching and Edit Distance
    Jupin, Joseph
    Shi, Justin Y.
    Obradovic, Zoran
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1234 - 1243
  • [43] Research on Fuzzy Rules Extraction of Futures Trading Based on MapReduce
    Liu, Xiaolin
    Liu, Xiaodong
    Mu, Yashuang
    Yang, Zhihao
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2017), 2017, : 483 - 488
  • [44] An Improved Fuzzy C-Means Algorithm Based on MapReduce
    Yu, Qing
    Ding, Zhimin
    2015 8TH INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS (BMEI), 2015, : 634 - 638
  • [45] Efficient Similarity Join Based on Earth Mover's Distance Using MapReduce
    Xu, Jia
    Lei, Bin
    Gu, Yu
    Winslett, Marianne
    Yu, Ge
    Zhang, Zhenjie
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (08) : 2148 - 2162
  • [46] Fuzzy improved firefly-based MapReduce for association rule mining
    Driff L.N.
    Drias H.
    International Journal of Innovative Computing and Applications, 2023, 14 (1-2) : 104 - 123
  • [47] Fuzzy Rough Discernibility Matrix Based Feature Subset Selection With MapReduce
    Pavani, Neeli Lakshmi
    Sowkuntla, Pandu
    Rani, K. Swarupa
    Prasad, P. S. V. S. Sai
    PROCEEDINGS OF THE 2019 IEEE REGION 10 CONFERENCE (TENCON 2019): TECHNOLOGY, KNOWLEDGE, AND SOCIETY, 2019, : 389 - 394
  • [48] Fuzzy K-mean Clustering in MapReduce on Cloud Based Hadoop
    Garg, Dweepna
    Trivedi, Khushboo
    2014 INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES (ICACCCT), 2014, : 1607 - 1610
  • [49] GeoSimMR: A mapreduce algorithm for detecting communities based on distance and interest in social networks
    Al Aghbari Z.
    Bahutair M.
    Kamel I.
    Data Science Journal, 2019, 18 (01):
  • [50] Fuzzy rule based classification systems for big data with MapReduce: granularity analysis
    Fernandez, Alberto
    del Rio, Sara
    Bawakid, Abdullah
    Herrera, Francisco
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2017, 11 (04) : 711 - 730