Unsupervised genetic programming based linkage rule (UGPLR) Miner for entity linking in semantic web

被引:1
作者
Singh, Amit [1 ]
Sharan, Aditi [1 ]
机构
[1] Jawaharlal Nehru Univ, Sch Comp & Syst Sci, New Delhi, India
关键词
Semantic web; Linked data; Entity linking; Linked open data; Genetic programming; Blocking;
D O I
10.1007/s12065-019-00263-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the past decade, the Semantic web data community has focused on publishing and interlinking data. Data publication is now widely done activity, but more effort needs to be devoted to interlink data sources. Organizations have been publishing data using different data curation and publication policies that have resulted in the proliferation of data sources. This proliferation has brought several challenges in interlinking data sources. Different data sources use different properties, descriptions to describe the same entity. Entity linking problem is at the core of data interlinking, it identifies and links instances, records referring to the same real-world entity. The state-of-the-art Entity Linking approaches are based on supervised learning. Supervised approaches rely on the labeled data for a better learning model and suffer in the absence of labeled data. The cost of labeling is high, and it is infeasible to carry out manual labeling process for datasets having billions of records. In this work, the authors have proposed a simple heuristic-based approach to generate the labeled data. The proposed approach uses automatically generated labeled data to train an underlying Genetic Programming based linkage rule-learning model. The proposed approach is scalable for large datasets and achieves comparable performance to other supervised approaches while eliminating the need for labeled data. The proposed approach works in the unsupervised (fully automatic) way at the same time keeping the advantages of supervised approaches such as high accuracy and less complexity. Experimental analysis proves that the proposed approach is effective than many states of the art approaches.
引用
收藏
页码:609 / 632
页数:24
相关论文
共 32 条
  • [11] Hu WY, 2011, J ONCOL NURS, V11, P87, DOI DOI 10.1145/1963405.1963421
  • [12] Active learning of expressive linkage rules using genetic programming
    Isele, Robert
    Bizer, Christian
    [J]. JOURNAL OF WEB SEMANTICS, 2013, 23 : 2 - 15
  • [13] Semi-supervised Instance Matching Using Boosted Classifiers
    Kejriwa, Mayank
    Miranker, Daniel P.
    [J]. SEMANTIC WEB: LATEST ADVANCES AND NEW DOMAINS, ESWC 2015, 2015, 9088 : 388 - 402
  • [14] An unsupervised instance matcher for schema-free RDF data
    Kejriwal, Mayank
    Miranker, Daniel P.
    [J]. JOURNAL OF WEB SEMANTICS, 2015, 35 : 102 - 123
  • [15] Koza JohnR., 2005, Genetic Programming, V1st
  • [16] RiMOM: A Dynamic Multistrategy Ontology Alignment Framework
    Li, Juanzi
    Tang, Jie
    Li, Yi
    Luo, Qiong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (08) : 1218 - 1232
  • [17] Luke S., 2002, Parallel Problem Solving from Nature - PPSN VII. 7th International Conference. Proceedings (Lecture Notes in Computer Science Vol.2439), P411
  • [18] A comparison of bloat control methods for genetic programming
    Luke, Sean
    Partait, Liviu
    [J]. EVOLUTIONARY COMPUTATION, 2006, 14 (03) : 309 - 344
  • [19] Lyko K, 2016, 13 INT C ESWC 2016
  • [20] Ngomo Axel-Cyrille Ngonga, 2012, The Semantic Web: Research and Applications. Proceedings 9th Extended Semantic Web Conference (ESWC 2012), P149, DOI 10.1007/978-3-642-30284-8_17