Enhancing Entity Resolution with a hybrid Active Machine Learning framework: Strategies for optimal learning in sparse datasets

被引:3
作者
Jabrane, Mourad [1 ]
Tabbaa, Hiba [1 ]
Hadri, Aissam [2 ]
Hafidi, Imad [1 ]
机构
[1] Sultan Moulay Slimane Univ, Lab Proc Engn Comp Sci & Math, Khouribga 25000, Morocco
[2] Ibnou Zohr Univ, Multidisciplinary Fac, Agadir 80000, Morocco
关键词
Entity Resolution; Active Machine Learning; Informativeness; Representativeness; LINKAGE;
D O I
10.1016/j.is.2024.102410
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When solving the problem of identifying similar records in different datasets (known as Entity Resolution or ER), one big challenge is the lack of enough labeled data. Which is crucial for building strong machine learning models, but getting this data can be expensive and time-consuming. Active Machine Learning (ActiveML) is a helpful approach because it cleverly picks the most useful pieces of data to learn from. It uses two main ideas: informativeness and representativeness. Typical ActiveML methods used in ER usually depend too much on just one of these ideas, which can make them less effective, especially when starting with very little data. Our research introduces a new combined method that uses both ideas together. We created two versions of this method, called DPQ and STQ, and tested them on eleven different real -world datasets. The results showed that our new method improves ER by producing better scores, more stable models, and faster learning with less training data compared to existing methods.
引用
收藏
页数:12
相关论文
共 57 条
[1]  
Bahri D., 2022, arXiv
[2]  
Bonwell CC., 1991, ASHE-ERIC Higher Education Report 1
[3]  
Brunner U., 2020, 23 INTCONF EXTENDING, P463, DOI DOI 10.21256/ZHAW-19637.52Q
[4]  
Lorena AC, 2020, Arxiv, DOI arXiv:1808.03591
[5]  
Chen DL, 2020, AAAI CONF ARTIF INTE, V34, P3438
[6]   Adaptive deep learning for entity resolution by risk analysis [J].
Chen, Qun ;
Chen, Zhaoqiang ;
Nafa, Youcef ;
Duan, Tianyi ;
Pan, Wei ;
Zhang, Lijun ;
Li, Zhanhuai .
KNOWLEDGE-BASED SYSTEMS, 2023, 260
[7]   GNEM: A Generic One-to-Set Neural Entity Matching Framework [J].
Chen, Runjin ;
Shen, Yanyan ;
Zhang, Dongxiang .
PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, :1686-1694
[8]   Towards Interpretable and Learnable Risk Analysis for Entity Resolution [J].
Chen, Zhaoqiang ;
Chen, Qun ;
Hou, Boyi ;
Li, Zhanhuai ;
Li, Guoliang .
SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, :1165-1180
[9]   Efficient Entity Resolution with Adaptive and Interactive Training Data Selection [J].
Christen, Peter ;
Vatsalan, Dinusha ;
Wang, Qing .
2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2015, :727-732
[10]  
Christophides V., 2015, Synthesis Lectures on Data, Semantics, and Knowledge, DOI [10.1007/978-3-031-79468-1, DOI 10.1007/978-3-031-79468-1]