Heterogeneous Committee-Based Active Learning for Entity Resolution (HeALER)

被引:8
|
作者
Chen, Xiao [1 ]
Xu, Yinlong [1 ]
Broneske, David [1 ]
Durand, Gabriel Campero [1 ]
Zoun, Roman [1 ]
Saake, Gunter [1 ]
机构
[1] Otto von Guericke Univ, Magdeburg, Germany
来源
ADVANCES IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019 | 2019年 / 11695卷
关键词
Entity resolution; Query-by-committee-based active learning; Learning-based entity resolution; Record linkage; RULES;
D O I
10.1007/978-3-030-28730-6_5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity resolution identifies records that refer to the same real-world entity. For its classification step, supervised learning can be adopted, but this faces limitations in the availability of labeled training data. Under this situation, active learning has been proposed to gather labels while reducing the human labeling effort, by selecting the most informative data as candidates for labeling. Committee-based active learning is one of the most commonly used approaches, which chooses data with the most disagreement of voting results of the committee, considering this as the most informative data. However, the current stateof-the-art committee-based active learning approaches for entity resolution have two main drawbacks: First, the selected initial training data is usually not balanced and informative enough. Second, the committee is formed with homogeneous classifiers by comprising their accuracy to achieve diversity of the committee, i.e., the classifiers are not trained with all available training data or the best parameter setting. In this paper, we propose our committee-based active learning approach HeALER, which overcomes both drawbacks by using more effective initial training data selection approaches and a more effective heterogenous committee. We implemented HeALER and compared it with passive learning and other state-of-the-art approaches. The experiment results prove that our approach outperforms other state-of-the-art committee-based active learning approaches.
引用
收藏
页码:69 / 85
页数:17
相关论文
共 50 条
  • [21] A Novel Committee-Based Clustering Method
    Fiol-Gonzalez, Sonia
    Almeida, Cassio
    Barbosa, Simone
    Lopes, Helio
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY (DAWAK 2018), 2018, 11031 : 126 - 136
  • [22] Active Blocking Scheme Learning for Entity Resolution
    Shao, Jingyu
    Wang, Qing
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II, 2018, 10938 : 350 - 362
  • [23] Unsupervised Bootstrapping of Active Learning for Entity Resolution
    Primpeli, Anna
    Bizer, Christian
    Keuper, Margret
    SEMANTIC WEB (ESWC 2020), 2020, 12123 : 215 - 231
  • [24] VDFChain: Secure and verifiable decentralized federated learning via committee-based blockchain
    Zhou, Ming
    Yang, Zhen
    Yu, Haiyang
    Yu, Shui
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2024, 223
  • [25] Committee-based sample selection for probabilistic classifiers
    Argamon-Engelson, S
    Dagan, I
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1999, 11 : 335 - 360
  • [26] A Committee-based Byzantine Consensus Protocol for Blockchain
    Meng, Yuli
    Cao, Zhao
    Qu, Dacheng
    PROCEEDINGS OF 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2018, : 705 - 710
  • [27] Committee-Based Sample Selection for Probabilistic Classifiers
    Argamon-Engelson, Shlomo
    Dagan, Ido
    Journal of Artificial Intelligence Research, 11 (00): : 335 - 360
  • [28] ERABQS: entity resolution based on active machine learning and balancing query strategy
    Mourad, Jabrane
    Hiba, Tabbaa
    Yassir, Rochd
    Imad, Hafidi
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, 62 (05) : 1347 - 1373
  • [29] Active deep learning on entity resolution by risk sampling
    Nafa, Youcef
    Chen, Qun
    Chen, Zhaoqiang
    Lu, Xingyu
    He, Haiyang
    Duan, Tianyi
    Li, Zhanhuai
    KNOWLEDGE-BASED SYSTEMS, 2022, 236
  • [30] Active Learning for Large-Scale Entity Resolution
    Qian, Kun
    Popa, Lucian
    Sen, Prithviraj
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 1379 - 1388