Empowering Transformer with Hybrid Matching Knowledge for Entity Matching

被引:2
作者
Dou, Wenzhou [1 ]
Shen, Derong [1 ]
Nie, Tiezheng [1 ]
Kou, Yue [1 ]
Sun, Chenchen [2 ]
Cui, Hang [3 ]
Yu, Ge [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China
[2] Tianjin Univ Technol, Sch Comp Sci & Engn, Tianjin, Peoples R China
[3] Univ Illinois, Champaign, IL USA
来源
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT III | 2022年
基金
中国国家自然科学基金;
关键词
Entity matching; Transformer; Pretrained language model; Hybrid matching graph; Graph contrastive learning;
D O I
10.1007/978-3-031-00129-1_4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transformers have achieved great success in many NLP tasks. The self-attention mechanism of Transformer learns powerful representation by conducting token-level pairwise interactions within the input sequence. In this paper, we propose a novel entity matching framework named GTA. GTA enhances Transformer for relational data representation by injecting additional hybrid matching knowledge. The hybrid matching knowledge is obtained via graph contrastive learning on a designed hybrid matching graph, in which the dual-level matching and multiple granularity interactions are modeled. In this way, GTA utilizes the prelearned knowledge of both hybrid matching and language modeling. This effectively empowers Transformer to understand the structural features of relational data when performing entity matching. Extensive experiments on open datasets show that GTA effectively enhances Transformer for relational data representation and outperforms state-of-the-art entity matching frameworks.
引用
收藏
页码:52 / 67
页数:16
相关论文
共 39 条
  • [21] Li B, 2020, AAAI CONF ARTIF INTE, V34, P8172
  • [22] Deep Entity Matching with Pre-Trained Language Models
    Li, Yuliang
    Li, Jinfeng
    Suhara, Yoshihiko
    Doan, AnHai
    Tan, Wang-Chiew
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (01): : 50 - 60
  • [23] Deep Entity Matching: Challenges and Opportunities
    Li, Yuliang
    Li, Jinfeng
    Suhara, Yoshihiko
    Wang, Jin
    Hirota, Wataru
    Tan, Wang-Chiew
    [J]. ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2021, 13 (01):
  • [24] Liu YH, 2019, Arxiv, DOI [arXiv:1907.11692, DOI 10.48550/ARXIV.1907.11692]
  • [25] Liu YX, 2022, Arxiv, DOI arXiv:2103.00111
  • [26] Marcus A, 2011, PROC VLDB ENDOW, V5, P13
  • [27] Deep Learning for Entity Matching: A Design Space Exploration
    Mudgal, Sidharth
    Li, Han
    Rekatsinas, Theodoros
    Doan, Anhai
    Park, Youngchoon
    Krishnan, Ganesh
    Deep, Rohit
    Arcaute, Esteban
    Raghavendra, Vijay
    [J]. SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 19 - 34
  • [28] Graph Learning for Combinatorial Optimization: A Survey of State-of-the-Art
    Peng, Yun
    Choi, Byron
    Xu, Jianliang
    [J]. DATA SCIENCE AND ENGINEERING, 2021, 6 (02) : 119 - 141
  • [29] Sanh V, 2020, Arxiv, DOI [arXiv:1910.01108, DOI 10.48550/ARXIV.1910.01108, 10.48550/arXiv.1910.01108]
  • [30] Synthesizing Entity Matching Rules by Examples
    Singh, Rohit
    Meduri, Venkata Vamsikrishna
    Elmagarmid, Ahmed
    Madden, Samuel
    Papotti, Paolo
    Quiane-Ruiz, Jorge-Arnulfo
    Solar-Lezama, Armando
    Tang, Nan
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 11 (02): : 189 - 202