Extraction of Validating Shapes from very large Knowledge Graphs

被引:8
作者
Rabbani, Kashif [1 ]
Lissandrini, Matteo [1 ]
Hose, Katja [1 ]
机构
[1] Aalborg Univ, Aalborg, Denmark
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2023年 / 16卷 / 05期
关键词
FREQUENT; OWL;
D O I
10.14778/3579075.3579078
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Knowledge Graphs (KGs) represent heterogeneous domain knowledge on the Web and within organizations. There exist shapes constraint languages to define validating shapes to ensure the quality of the data in KGs. Existing techniques to extract validating shapes often fail to extract complete shapes, are not scalable, and are prone to produce spurious shapes. To address these shortcomings, we propose the QUALITY SHAPES EXTRACTION (QSE) approach to extract validating shapes in very large graphs, for which we devise both an exact and an approximate solution. QSE provides information about the reliability of shape constraints by computing their confidence and support within a KG and in doing so allows to identify shapes that are most informative and less likely to be affected by incomplete or incorrect data. To the best of our knowledge, QSE is the first approach to extract a complete set of validating shapes from WikiData. Moreover, QSE provides a 12x reduction in extraction time compared to existing approaches, while managing to filter out up to 93% of the invalid and spurious shapes, resulting in a reduction of up to 2 orders of magnitude in the number of constraints presented to the user, e.g., from 11,916 to 809 on DBpedia.
引用
收藏
页码:1023 / 1032
页数:10
相关论文
共 53 条
  • [1] RuleHub: A Public Corpus of Rules for Knowledge Graphs
    Ahmadi, Naser
    Thi-Thuy-Duyen Truong
    Le-Hong-Mai Dao
    Ortona, Stefano
    Papotti, Paolo
    [J]. ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2020, 12 (04):
  • [2] Ahmetaj Shqiponja, 2022, PROC VLDB ENDOW, V15, P2284
  • [3] ABSTAT-HD: a scalable tool for profiling very large knowledge graphs
    Alva Principe, Renzo Arturo
    Maurino, Andrea
    Palmonari, Matteo
    Ciavotta, Michele
    Spahiu, Blerina
    [J]. VLDB JOURNAL, 2022, 31 (05) : 851 - 876
  • [4] DBpedia: A nucleus for a web of open data
    Auer, Soeren
    Bizer, Christian
    Kobilarov, Georgi
    Lehmann, Jens
    Cyganiak, Richard
    Ives, Zachary
    [J]. SEMANTIC WEB, PROCEEDINGS, 2007, 4825 : 722 - +
  • [5] Boneva I., 2019, P ISWC SAT TRACKS, P269
  • [6] Frequent item set mining
    Borgelt, Christian
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 2 (06) : 437 - 456
  • [7] Bringmann B, 2008, LECT NOTES ARTIF INT, V5012, P858, DOI 10.1007/978-3-540-68125-0_84
  • [8] Summarizing semantic graphs: a survey
    Cebiric, Sejla
    Goasdoue, Francois
    Kondylakis, Haridimos
    Kotzinos, Dimitris
    Manolescu, Ioana
    Troullinou, Georgia
    Zneika, Mussab
    [J]. VLDB JOURNAL, 2019, 28 (03) : 295 - 327
  • [9] Astrea: Automatic Generation of SHACL Shapes from Ontologies
    Cimmino, Andrea
    Fernandez-Izquierdo, Alba
    Garcia-Castro, Raul
    [J]. SEMANTIC WEB (ESWC 2020), 2020, 12123 : 497 - 513
  • [10] Automatic extraction of shapes using sheXer
    Fernandez-Alvarez, Daniel
    Labra-Gayo, Jose Emilio
    Gayo-Avello, Daniel
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 238