Unsupervised genetic programming based linkage rule (UGPLR) Miner for entity linking in semantic web

被引:0
作者
Amit Singh
Aditi Sharan
机构
[1] Jawaharlal Nehru University,School of Computer and Systems Sciences
来源
Evolutionary Intelligence | 2019年 / 12卷
关键词
Semantic web; Linked data; Entity linking; Linked open data; Genetic programming; Blocking;
D O I
暂无
中图分类号
学科分类号
摘要
In the past decade, the Semantic web data community has focused on publishing and interlinking data. Data publication is now widely done activity, but more effort needs to be devoted to interlink data sources. Organizations have been publishing data using different data curation and publication policies that have resulted in the proliferation of data sources. This proliferation has brought several challenges in interlinking data sources. Different data sources use different properties, descriptions to describe the same entity. Entity linking problem is at the core of data interlinking, it identifies and links instances, records referring to the same real-world entity. The state-of-the-art Entity Linking approaches are based on supervised learning. Supervised approaches rely on the labeled data for a better learning model and suffer in the absence of labeled data. The cost of labeling is high, and it is infeasible to carry out manual labeling process for datasets having billions of records. In this work, the authors have proposed a simple heuristic-based approach to generate the labeled data. The proposed approach uses automatically generated labeled data to train an underlying Genetic Programming based linkage rule-learning model. The proposed approach is scalable for large datasets and achieves comparable performance to other supervised approaches while eliminating the need for labeled data. The proposed approach works in the unsupervised (fully automatic) way at the same time keeping the advantages of supervised approaches such as high accuracy and less complexity. Experimental analysis proves that the proposed approach is effective than many states of the art approaches.
引用
收藏
页码:609 / 632
页数:23
相关论文
共 32 条
[1]  
Bizer C(2009)Linked data—the story so far Int J Semant Web Inf Syst 5 1-22
[2]  
Heath T(2001)Learning object identification rules for information integration Inf Syst 26 607-633
[3]  
Berners-Lee T(2013)Active learning of expressive linkage rules using genetic programming J Web Semant 23 2-15
[4]  
Tejada S(2018)Genetic-fuzzy programming based linkage rule miner (GFPLR-Miner) for entity linking in semantic web Int J Semant Web Inf Syst 14 134-166
[5]  
Knoblock CCA(2018)STEM: stacked threshold-based entity matching for knowledge base generation Semant Web 10 117-137
[6]  
Minton S(2015)SERIMI: class-based matching for instance matching across heterogeneous datasets IEEE Trans Knowl Data Eng 27 1397-1440
[7]  
Isele R(2009)RiMOM: a dynamic multistrategy ontology alignment framework IEEE Trans Knowl Data Eng 21 1218-1232
[8]  
Bizer C(2006)A comparison of bloat control methods for genetic programming Evol Comput 14 309-344
[9]  
Singh A(2015)An unsupervised instance matcher for schema-free RDF data Web Semant Sci Serv Agents World Wide Web 35 102-123
[10]  
Sharan A(2012)A genetic programming approach to record deduplication IEEE Trans Knowl Data Eng 24 399-412