CampER: An Effective Framework for Privacy-Aware Deep Entity Resolution

被引:4
作者
Guo, Yuxiang [1 ]
Chen, Lu [1 ]
Zhou, Zhengjie [2 ]
Zheng, Baihua [3 ]
Fang, Ziquan [1 ]
Zhang, Zhikun [4 ]
Mao, Yuren [2 ]
Gao, Yunjun [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] Zhejiang Univ, Ningbo, Peoples R China
[3] Singapore Management Univ, Singapore, Singapore
[4] Stanford Univ, Palo Alto, CA 94304 USA
来源
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023 | 2023年
关键词
entity resolution; representation learning; similarity measurement; LINKAGE;
D O I
10.1145/3580305.3599266
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity Resolution (ER) is a fundamental problem in data preparation. Standard deep ER methods have achieved state-of-the-art effectiveness, assuming that relations from different organizations are centrally stored. However, due to privacy concerns, it can be difficult to centralize data in practice, rendering standard deep ER solutions inapplicable. Despite efforts to develop rule-based privacy-preserving ER methods, they often neglect subtle matching mechanisms and have poor effectiveness as a result. To bridge effectiveness and privacy, in this paper, we propose CampER, an effective framework for privacy-aware deep entity resolution. Specifically, we first design a training pair self-generation strategy to overcome the absence of manually labeled data in privacy-aware scenarios. Based on the self-constructed training pairs, we present a collaborative fine-tuning approach to learn the match-aware and uni-space individual tuple embeddings for accurate matching decisions. During the matching decision-making process, we first introduce a cryptographically secure approach to determine matches. Furthermore, we propose an order-preserving perturbation strategy to significantly accelerate the matching computation while guaranteeing the consistency of ER results. Extensive experiments on eight widely-used benchmark datasets demonstrate that CampER not only is comparable with the state-of-the-art standard deep ER solutions in effectiveness, but also preserves privacy.
引用
收藏
页码:626 / 637
页数:12
相关论文
共 39 条
[31]   Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach [J].
Chai, Chengliang ;
Li, Guoliang ;
Li, Jian ;
Deng, Dong ;
Feng, Jianhua .
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :969-984
[32]   Entity resolution framework using rough set blocking for heterogeneous web of data [J].
Vidhya, K. A. ;
Geetha, T. V. .
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (01) :659-675
[33]   GFNER: A Unified Global Feature-Aware Framework for Flat and Nested Named Entity Recognition [J].
Chen, Jiayin ;
Chen, Xi ;
Pan, Shuai ;
Zhang, Wei .
IEEE ACCESS, 2023, 11 :55139-55148
[34]   Cost-effective crowdsourced join queries for entity resolution without prior knowledge [J].
Yin, Bo ;
Zeng, Weilong ;
Wei, Xuetao .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 127 :240-251
[35]   Collapsing Corporate Confusion Leveraging Network Structures for Effective Entity Resolution in Relational Corporate Data [J].
Marple, Tim ;
Desmarais, Bruce ;
Young, Kevin L. .
2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, :2637-2643
[36]   Enhancing Deep Entity Resolution with Integrated Blocker-Matcher Training: Balancing Consensus and Discrepancy [J].
Dou, Wenzhou ;
Shen, Derong ;
Zhou, Xiangmin ;
Bai, Hui ;
Kou, Yue ;
Nie, Tiezheng ;
Cui, Hang ;
Yu, Ge .
PROCEEDINGS OF THE 33RD ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2024, 2024, :508-518
[37]   Span-aware pre-trained network with deep information bottleneck for scientific entity relation extraction [J].
Wang, Youwei ;
Cao, Peisong ;
Fang, Haichuan ;
Ye, Yangdong .
NEURAL NETWORKS, 2025, 186
[38]   An effective approach to entity resolution problem using quasi-clique and its application to digital libraries [J].
On, Byung-Won ;
Elmacioglu, Ergin ;
Lee, Dongwon ;
Kang, Jaewoo ;
Pei, Jian .
OPENING INFORMATION HORIZONS, 2006, :51-+
[39]   When GDD meets GNN: A knowledge-driven neural connection for effective entity resolution in property graphs [J].
Hu, Junwei ;
Bewong, Michael ;
Kwashie, Selasi ;
Zhang, Yidi ;
Nofong, Vincent ;
Wondoh, John ;
Feng, Zaiwen .
INFORMATION SYSTEMS, 2025, 132