CORDEL: A Contrastive Deep Learning Approach for Entity Linkage

被引:15
作者
Wang, Zhengyang [1 ]
Sisman, Bunyamin [2 ]
Wei, Hao [2 ]
Dong, Xin Luna [2 ]
Ji, Shuiwang [1 ]
机构
[1] Texas A&M Univ, College Stn, TX 77843 USA
[2] Amazon Com, Seattle, WA USA
来源
20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020) | 2020年
关键词
entity linkage; twin network; deep learning;
D O I
10.1109/ICDM50108.2020.00171
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entity linkage (EL) is a critical problem in data cleaning and integration. In the past several decades, EL has typically been done by rule-based systems or traditional machine learning models with hand-curated features, both of which heavily depend on manual human inputs. With the ever-increasing growth of new data, deep learning (DL) based approaches have been proposed to alleviate the high cost of EL associated with the traditional models. Existing exploration of DL models for EL strictly follows the well-known twin-network architecture. However, we argue that the twin-network architecture is sub-optimal to EL, leading to inherent drawbacks of existing models. In order to address the drawbacks, we propose a novel and generic contrastive DL framework for EL. The proposed framework is able to capture both syntactic and semantic matching signals and pays attention to subtle but critical differences. Based on the framework, we develop a contrastive DL approach for EL, CORDEL, with a simple yet powerful variant called CORDEL-Sum. We evaluate CORDEL with extensive experiments conducted on both public benchmark datasets and a real-world dataset. CORDEL outperforms previous state-of-the-art models by 5.2% on public benchmark datasets. Moreover, CORDEL yields a 29.4% improvement over the current best DL model on the real-world dataset, while reducing the number of training parameters by 96.8%.
引用
收藏
页码:1322 / 1327
页数:6
相关论文
共 10 条
  • [1] Distributed Representations of Tuples for Entity Resolution
    Ebraheem, Muhammad
    Thirumuruganathan, Saravanan
    Joty, Shafiq
    Ouzzani, Mourad
    Tang, Nan
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (11): : 1454 - 1467
  • [2] Joulin A., 2017, Short Papers, V2, P427, DOI DOI 10.18653/V1/E17-2068
  • [3] Konda P, 2016, PROC VLDB ENDOW, V9, P1197
  • [4] Deep Learning for Entity Matching: A Design Space Exploration
    Mudgal, Sidharth
    Li, Han
    Rekatsinas, Theodoros
    Doan, Anhai
    Park, Youngchoon
    Krishnan, Ganesh
    Deep, Rohit
    Arcaute, Esteban
    Raghavendra, Vijay
    [J]. SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 19 - 34
  • [5] Deep Sequence-to-Sequence Entity Matching for Heterogeneous Entity Resolution
    Nie, Hao
    Han, Xianpei
    He, Ben
    Sun, Le
    Chen, Bo
    Zhang, Wei
    Wu, Suhui
    Kong, Hao
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 629 - 638
  • [6] NT H., 2019, ARXIV190509550
  • [7] The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets
    Saito, Takaya
    Rehmsmeier, Marc
    [J]. PLOS ONE, 2015, 10 (03):
  • [8] A Low-power Pyramid Motion Estimation Engine for 4K@30fps Realtime HEVC Video Encoding
    Xu, Ke
    Huang, Bo
    Liu, Xiangkai
    Tu, Xueying
    Wu, Zhuoyan
    Yan, Zhanpeng
    Liu, Peng
    Han, Bin
    Li, Yu
    [J]. 2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
  • [9] Auto-EM: End-to-end Fuzzy Entity-Matching using Pre-trained Deep Models and Transfer Learning
    Zhao, Chen
    He, Yeye
    [J]. WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 2413 - 2424
  • [10] Zhu Q., 2020, WORLD WID WEB C ASS