Heterogeneous Committee-Based Active Learning for Entity Resolution (HeALER)

被引:8
作者
Chen, Xiao [1 ]
Xu, Yinlong [1 ]
Broneske, David [1 ]
Durand, Gabriel Campero [1 ]
Zoun, Roman [1 ]
Saake, Gunter [1 ]
机构
[1] Otto von Guericke Univ, Magdeburg, Germany
来源
ADVANCES IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019 | 2019年 / 11695卷
关键词
Entity resolution; Query-by-committee-based active learning; Learning-based entity resolution; Record linkage; RULES;
D O I
10.1007/978-3-030-28730-6_5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity resolution identifies records that refer to the same real-world entity. For its classification step, supervised learning can be adopted, but this faces limitations in the availability of labeled training data. Under this situation, active learning has been proposed to gather labels while reducing the human labeling effort, by selecting the most informative data as candidates for labeling. Committee-based active learning is one of the most commonly used approaches, which chooses data with the most disagreement of voting results of the committee, considering this as the most informative data. However, the current stateof-the-art committee-based active learning approaches for entity resolution have two main drawbacks: First, the selected initial training data is usually not balanced and informative enough. Second, the committee is formed with homogeneous classifiers by comprising their accuracy to achieve diversity of the committee, i.e., the classifiers are not trained with all available training data or the best parameter setting. In this paper, we propose our committee-based active learning approach HeALER, which overcomes both drawbacks by using more effective initial training data selection approaches and a more effective heterogenous committee. We implemented HeALER and compared it with passive learning and other state-of-the-art approaches. The experiment results prove that our approach outperforms other state-of-the-art committee-based active learning approaches.
引用
收藏
页码:69 / 85
页数:17
相关论文
共 27 条
[1]  
[Anonymous], 2012, DATA MATCHING CONCEP, DOI DOI 10.1007/978-3-642-31164-2
[2]  
[Anonymous], 2004, ICML
[3]  
[Anonymous], 2006, COMPUT SCI ENG
[4]  
[Anonymous], 2010, Plano de pormenor da intervencao na margem direita da foz do rio Jamor - Caracterizacao da vegetacao, DOI [10.1109/CEC.2010.5586104, DOI 10.1109/CEC.2010.5586104]
[5]  
Arasu A., 2010, SIGMOD, P783, DOI DOI 10.1145/1807167.1807252
[6]  
Bellare K., 2013, TKDD
[7]  
Bellare K., 2012, P 18 ACM SIGKDD INT, P1131, DOI DOI 10.1145/2339530.2339707
[8]  
Chen X., 2019, BTW
[9]   Failure Discrimination Mechanism Joint Blocking Training for Target Tracking [J].
Chen, Xun ;
Xia, Siwei ;
Lu, Hu .
PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018), 2018, :300-304
[10]  
Doshi-Velez Finale., 2017, arXiv: Machine Learning