Heterogeneous Committee-Based Active Learning for Entity Resolution (HeALER)

被引:8
作者
Chen, Xiao [1 ]
Xu, Yinlong [1 ]
Broneske, David [1 ]
Durand, Gabriel Campero [1 ]
Zoun, Roman [1 ]
Saake, Gunter [1 ]
机构
[1] Otto von Guericke Univ, Magdeburg, Germany
来源
ADVANCES IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019 | 2019年 / 11695卷
关键词
Entity resolution; Query-by-committee-based active learning; Learning-based entity resolution; Record linkage; RULES;
D O I
10.1007/978-3-030-28730-6_5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity resolution identifies records that refer to the same real-world entity. For its classification step, supervised learning can be adopted, but this faces limitations in the availability of labeled training data. Under this situation, active learning has been proposed to gather labels while reducing the human labeling effort, by selecting the most informative data as candidates for labeling. Committee-based active learning is one of the most commonly used approaches, which chooses data with the most disagreement of voting results of the committee, considering this as the most informative data. However, the current stateof-the-art committee-based active learning approaches for entity resolution have two main drawbacks: First, the selected initial training data is usually not balanced and informative enough. Second, the committee is formed with homogeneous classifiers by comprising their accuracy to achieve diversity of the committee, i.e., the classifiers are not trained with all available training data or the best parameter setting. In this paper, we propose our committee-based active learning approach HeALER, which overcomes both drawbacks by using more effective initial training data selection approaches and a more effective heterogenous committee. We implemented HeALER and compared it with passive learning and other state-of-the-art approaches. The experiment results prove that our approach outperforms other state-of-the-art committee-based active learning approaches.
引用
收藏
页码:69 / 85
页数:17
相关论文
共 50 条
  • [31] Unsupervised Entity Resolution Method Based on Random Forest
    Xu, Wanying
    Sun, Chenchen
    Xu, Lei
    Chen, Wenyu
    Hou, Zhijiang
    WEB INFORMATION SYSTEMS AND APPLICATIONS (WISA 2021), 2021, 12999 : 372 - 382
  • [32] Learning representations of Web entities for entity resolution
    Barbosa, Luciano
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2019, 15 (03) : 346 - 358
  • [33] Active Learning of Regular Expressions for Entity Extraction
    Bartoli, Alberto
    De Lorenzo, Andrea
    Medvet, Eric
    Tarlao, Fabiano
    IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (03) : 1067 - 1080
  • [34] Entity Resolution in Texts Using Statistical Learning and Ontologies
    Stajner, Tadej
    Mladenic, Dunja
    SEMANTIC WEB, PROCEEDINGS, 2009, 5926 : 91 - 104
  • [35] Adaptive deep learning for entity resolution by risk analysis
    Chen, Qun
    Chen, Zhaoqiang
    Nafa, Youcef
    Duan, Tianyi
    Pan, Wei
    Zhang, Lijun
    Li, Zhanhuai
    KNOWLEDGE-BASED SYSTEMS, 2023, 260
  • [36] RULE BASED METHOD FOR ENTITY RESOLUTION USING DISTINCT TREE CONSTRUCTION
    Archa, Ammu P.
    Kumar, Lekshmy D.
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORKS (COMNET), 2016, : 143 - 147
  • [37] Efficient entity resolution based on subgraph cohesion
    Wang, Hongzhi
    Li, Jianzhong
    Gao, Hong
    KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 46 (02) : 285 - 314
  • [38] A scalable MapReduce-based design of an unsupervised entity resolution system
    Hagan, Nicholas Kofi Akortia
    Talburt, John R.
    Anderson, Kris E.
    Hagan, Deasia
    FRONTIERS IN BIG DATA, 2024, 7
  • [39] Efficient entity resolution based on subgraph cohesion
    Hongzhi Wang
    Jianzhong Li
    Hong Gao
    Knowledge and Information Systems, 2016, 46 : 285 - 314
  • [40] Efficient Entity Resolution Based on Sequence Rules
    Li, Yakun
    Wang, Hongzhi
    Gao, Hong
    ADVANCED RESEARCH ON COMPUTER SCIENCE AND INFORMATION ENGINEERING, PT I, 2011, 152 : 381 - 388