CampER: An Effective Framework for Privacy-Aware Deep Entity Resolution

被引:2
作者
Guo, Yuxiang [1 ]
Chen, Lu [1 ]
Zhou, Zhengjie [2 ]
Zheng, Baihua [3 ]
Fang, Ziquan [1 ]
Zhang, Zhikun [4 ]
Mao, Yuren [2 ]
Gao, Yunjun [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] Zhejiang Univ, Ningbo, Peoples R China
[3] Singapore Management Univ, Singapore, Singapore
[4] Stanford Univ, Palo Alto, CA 94304 USA
来源
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023 | 2023年
关键词
entity resolution; representation learning; similarity measurement; LINKAGE;
D O I
10.1145/3580305.3599266
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity Resolution (ER) is a fundamental problem in data preparation. Standard deep ER methods have achieved state-of-the-art effectiveness, assuming that relations from different organizations are centrally stored. However, due to privacy concerns, it can be difficult to centralize data in practice, rendering standard deep ER solutions inapplicable. Despite efforts to develop rule-based privacy-preserving ER methods, they often neglect subtle matching mechanisms and have poor effectiveness as a result. To bridge effectiveness and privacy, in this paper, we propose CampER, an effective framework for privacy-aware deep entity resolution. Specifically, we first design a training pair self-generation strategy to overcome the absence of manually labeled data in privacy-aware scenarios. Based on the self-constructed training pairs, we present a collaborative fine-tuning approach to learn the match-aware and uni-space individual tuple embeddings for accurate matching decisions. During the matching decision-making process, we first introduce a cryptographically secure approach to determine matches. Furthermore, we propose an order-preserving perturbation strategy to significantly accelerate the matching computation while guaranteeing the consistency of ER results. Extensive experiments on eight widely-used benchmark datasets demonstrate that CampER not only is comparable with the state-of-the-art standard deep ER solutions in effectiveness, but also preserves privacy.
引用
收藏
页码:626 / 637
页数:12
相关论文
共 36 条
  • [11] A theoretical framework for knowledge-based entity resolution
    Schewe, Klaus-Dieter
    Wang, Qing
    THEORETICAL COMPUTER SCIENCE, 2014, 549 : 101 - 126
  • [12] Robust fuzzy rule base framework for entity resolution
    Gaborski, Roger S.
    Allen, Virginia
    Yacci, Paul
    EVOLUTIONARY AND BIO-INSPIRED COMPUTATION: THEORY AND APPLICATIONS VI, 2012, 8402
  • [13] Adaptive deep learning for entity resolution by risk analysis
    Chen, Qun
    Chen, Zhaoqiang
    Nafa, Youcef
    Duan, Tianyi
    Pan, Wei
    Zhang, Lijun
    Li, Zhanhuai
    KNOWLEDGE-BASED SYSTEMS, 2023, 260
  • [14] Active deep learning on entity resolution by risk sampling
    Nafa, Youcef
    Chen, Qun
    Chen, Zhaoqiang
    Lu, Xingyu
    He, Haiyang
    Duan, Tianyi
    Li, Zhanhuai
    KNOWLEDGE-BASED SYSTEMS, 2022, 236
  • [15] Deep Learning Based Approach for Entity Resolution in Databases
    Kooli, Nihel
    Allesiardo, Robin
    Pigneul, Erwan
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2018, PT II, 2018, 10752 : 3 - 12
  • [16] Provenance-Aware Entity Resolution: Leveraging Provenance to Improve Quality
    Wang, Qing
    Schewe, Klaus-Dieter
    Wang, Woods
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT1, 2015, 9049 : 474 - 490
  • [17] Enhancing Loosely Schema-aware Entity Resolution with User Interaction
    Simonini, Giovanni
    Gagliardelli, Luca
    Zhu, Song
    Bergamaschi, Sonia
    PROCEEDINGS 2018 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2018, : 860 - 864
  • [18] TREATS: Fairness-aware entity resolution over streaming data
    Araujo, Tiago Brasileiro
    Efthymiou, Vasilis
    Christophides, Vassilis
    Pitoura, Evaggelia
    Stefanidis, Kostas
    INFORMATION SYSTEMS, 2025, 129
  • [19] An effective weighted rule-based method for entity resolution
    Hiba Abu Ahmad
    Hongzhi Wang
    Distributed and Parallel Databases, 2018, 36 : 593 - 612
  • [20] An effective weighted rule-based method for entity resolution
    Abu Ahmad, Hiba
    Wang, Hongzhi
    DISTRIBUTED AND PARALLEL DATABASES, 2018, 36 (03) : 593 - 612