Heterogeneous Committee-Based Active Learning for Entity Resolution (HeALER)

被引:8
作者
Chen, Xiao [1 ]
Xu, Yinlong [1 ]
Broneske, David [1 ]
Durand, Gabriel Campero [1 ]
Zoun, Roman [1 ]
Saake, Gunter [1 ]
机构
[1] Otto von Guericke Univ, Magdeburg, Germany
来源
ADVANCES IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019 | 2019年 / 11695卷
关键词
Entity resolution; Query-by-committee-based active learning; Learning-based entity resolution; Record linkage; RULES;
D O I
10.1007/978-3-030-28730-6_5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity resolution identifies records that refer to the same real-world entity. For its classification step, supervised learning can be adopted, but this faces limitations in the availability of labeled training data. Under this situation, active learning has been proposed to gather labels while reducing the human labeling effort, by selecting the most informative data as candidates for labeling. Committee-based active learning is one of the most commonly used approaches, which chooses data with the most disagreement of voting results of the committee, considering this as the most informative data. However, the current stateof-the-art committee-based active learning approaches for entity resolution have two main drawbacks: First, the selected initial training data is usually not balanced and informative enough. Second, the committee is formed with homogeneous classifiers by comprising their accuracy to achieve diversity of the committee, i.e., the classifiers are not trained with all available training data or the best parameter setting. In this paper, we propose our committee-based active learning approach HeALER, which overcomes both drawbacks by using more effective initial training data selection approaches and a more effective heterogenous committee. We implemented HeALER and compared it with passive learning and other state-of-the-art approaches. The experiment results prove that our approach outperforms other state-of-the-art committee-based active learning approaches.
引用
收藏
页码:69 / 85
页数:17
相关论文
共 50 条
  • [41] Efficient Entity Resolution Based on Sequence Rules
    Li, Yakun
    Wang, Hongzhi
    Gao, Hong
    ADVANCED RESEARCH ON COMPUTER SCIENCE AND INFORMATION ENGINEERING, PT I, 2011, 152 : 381 - 388
  • [42] ERBlox: Combining matching dependencies with machine learning for entity resolution
    Bahmani, Zeinab
    Bertossi, Leopoldo
    Vasiloglou, Nikolaos
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2017, 83 : 118 - 141
  • [43] ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
    Bahmani, Zeinab
    Bertossi, Leopoldo
    Vasiloglou, Nikolaos
    SCALABLE UNCERTAINTY MANAGEMENT (SUM 2015), 2015, 9310 : 399 - 414
  • [44] Exploring Spark-SQL-Based Entity Resolution Using the Persistence Capability
    Chen, Xiao
    Zoun, Roman
    Schallehn, Eike
    Mantha, Sravani
    Rapuru, Kirity
    Saake, Gunter
    BEYOND DATABASES, ARCHITECTURES AND STRUCTURES: FACING THE CHALLENGES OF DATA PROLIFERATION AND GROWING VARIETY, 2018, 928 : 3 - 17
  • [45] Landmarks-based Blocking Method For Large-scale Entity Resolution
    Herath, Samudra
    Roughan, Matthew
    Glonek, Gary
    2020 IEEE 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2020), 2020, : 773 - 774
  • [46] A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching
    Meduri, Vamsi
    Popa, Lucian
    Sen, Prithviraj
    Sarwat, Mohamed
    SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 1133 - 1147
  • [47] Eliminating the Redundancy in MapReduce-based Entity Resolution
    Yan, Cairong
    Song, Yalong
    Wang, Jian
    Guo, Wenjing
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 1233 - 1236
  • [48] Euclidean-based Entity Resolution for Evolving Data
    Lu, Chang
    Wang, Hongzhi
    Zhang, Yan
    Gao, Hong
    2015 FIFTH INTERNATIONAL CONFERENCE ON INSTRUMENTATION AND MEASUREMENT, COMPUTER, COMMUNICATION AND CONTROL (IMCCC), 2015, : 1547 - 1552
  • [49] EntityManager: Managing Dirty Data Based on Entity Resolution
    Xue-Li Liu
    Hong-Zhi Wang
    Jian-Zhong Li
    Hong Gao
    Journal of Computer Science and Technology, 2017, 32 : 644 - 662
  • [50] Entity resolution for media metadata based on structural clustering
    Gu, Qi
    Cao, Jian
    Liu, Yancen
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (1-2) : 219 - 242