Unsupervised genetic programming based linkage rule (UGPLR) Miner for entity linking in semantic web

被引:1
作者
Singh, Amit [1 ]
Sharan, Aditi [1 ]
机构
[1] Jawaharlal Nehru Univ, Sch Comp & Syst Sci, New Delhi, India
关键词
Semantic web; Linked data; Entity linking; Linked open data; Genetic programming; Blocking;
D O I
10.1007/s12065-019-00263-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the past decade, the Semantic web data community has focused on publishing and interlinking data. Data publication is now widely done activity, but more effort needs to be devoted to interlink data sources. Organizations have been publishing data using different data curation and publication policies that have resulted in the proliferation of data sources. This proliferation has brought several challenges in interlinking data sources. Different data sources use different properties, descriptions to describe the same entity. Entity linking problem is at the core of data interlinking, it identifies and links instances, records referring to the same real-world entity. The state-of-the-art Entity Linking approaches are based on supervised learning. Supervised approaches rely on the labeled data for a better learning model and suffer in the absence of labeled data. The cost of labeling is high, and it is infeasible to carry out manual labeling process for datasets having billions of records. In this work, the authors have proposed a simple heuristic-based approach to generate the labeled data. The proposed approach uses automatically generated labeled data to train an underlying Genetic Programming based linkage rule-learning model. The proposed approach is scalable for large datasets and achieves comparable performance to other supervised approaches while eliminating the need for labeled data. The proposed approach works in the unsupervised (fully automatic) way at the same time keeping the advantages of supervised approaches such as high accuracy and less complexity. Experimental analysis proves that the proposed approach is effective than many states of the art approaches.
引用
收藏
页码:609 / 632
页数:24
相关论文
共 32 条
  • [1] [Anonymous], 2011, CEUR WS
  • [2] [Anonymous], 2011, P ONTOLOGY MATCHING
  • [3] SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets
    Araujo, Samur
    Duc Thanh Tran
    de Vries, Arjen P.
    Schwabe, Daniel
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (05) : 1397 - 1410
  • [4] Bilenko M, 2006, IEEE DATA MINING, P87
  • [5] Bilenko Mikhail, 2003, 9 ACM SIGKDD INTCONF, P39, DOI [DOI 10.1145/956750.956759, 10.1145/956750.956759]
  • [6] Bizer C, 2011, SEMANTIC SERVICES, INTEROPERABILITY AND WEB APPLICATIONS: EMERGING CONCEPTS, P205, DOI 10.4018/978-1-60960-593-3.ch008
  • [7] Christen Peter, 2008, SIGKDD, P1065, DOI DOI 10.1145/1401890.1402020
  • [8] A Genetic Programming Approach to Record Deduplication
    de Carvalho, Moises G.
    Laender, Alberto H. F.
    Goncalves, Marcos Andre
    da Silva, Altigran S.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (03) : 399 - 412
  • [9] Demartini G., 2012, P 21 INT C WORLD WID, P469
  • [10] Elfeky M, 2002, 18 INT C DAT ENG