Empowering Transformer with Hybrid Matching Knowledge for Entity Matching

被引:2
作者
Dou, Wenzhou [1 ]
Shen, Derong [1 ]
Nie, Tiezheng [1 ]
Kou, Yue [1 ]
Sun, Chenchen [2 ]
Cui, Hang [3 ]
Yu, Ge [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China
[2] Tianjin Univ Technol, Sch Comp Sci & Engn, Tianjin, Peoples R China
[3] Univ Illinois, Champaign, IL USA
来源
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT III | 2022年
基金
中国国家自然科学基金;
关键词
Entity matching; Transformer; Pretrained language model; Hybrid matching graph; Graph contrastive learning;
D O I
10.1007/978-3-031-00129-1_4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transformers have achieved great success in many NLP tasks. The self-attention mechanism of Transformer learns powerful representation by conducting token-level pairwise interactions within the input sequence. In this paper, we propose a novel entity matching framework named GTA. GTA enhances Transformer for relational data representation by injecting additional hybrid matching knowledge. The hybrid matching knowledge is obtained via graph contrastive learning on a designed hybrid matching graph, in which the dual-level matching and multiple granularity interactions are modeled. In this way, GTA utilizes the prelearned knowledge of both hybrid matching and language modeling. This effectively empowers Transformer to understand the structural features of relational data when performing entity matching. Extensive experiments on open datasets show that GTA effectively enhances Transformer for relational data representation and outperforms state-of-the-art entity matching frameworks.
引用
收藏
页码:52 / 67
页数:16
相关论文
共 39 条
  • [1] Abedjan Z, 2016, PROC VLDB ENDOW, V9, P993
  • [2] Brunner U., 2020, INT C EXTENDING DATA
  • [3] Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks
    Cappuzzo, Riccardo
    Papotti, Paolo
    Thirumuruganathan, Saravanan
    [J]. SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 1335 - 1349
  • [4] GNEM: A Generic One-to-Set Neural Entity Matching Framework
    Chen, Runjin
    Shen, Yanyan
    Zhang, Dongxiang
    [J]. PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 1686 - 1694
  • [5] Christen P., 2012, DATA MATCHING CONCEP
  • [6] Clevert DA, 2016, Arxiv, DOI [arXiv:1511.07289, 10.48550/arXiv.1511.07289, DOI 10.48550/ARXIV.1511.07289]
  • [7] Dalvi Nilesh, 2013, WWW, P295
  • [8] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [9] Dong XL, 2013, PROC INT CONF DATA, P1245, DOI 10.1109/ICDE.2013.6544914
  • [10] Dosovitskiy A., 2020, INT C LEARNING REPRE