Scaling Up Schema Discovery for RDF Datasets

被引:5
|
作者
Bouhamoum, Redouane [1 ]
Kellou-Menouer, Kenza [1 ]
Kedad, Zoubida [1 ]
Lopes, Sephane [1 ]
机构
[1] DAVID Univ Versailles St Quentin En Yvelines, Versailles, France
来源
2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW) | 2018年
关键词
D O I
10.1109/ICDEW.2018.00021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An increasing number of data sources is published on the Web, expressed using the languages proposed by the W3C such as RDF. In these sources, data is not constrained by a schema: data could differ from the schema-related statements provided in the source; furthermore, the schema could be incomplete or even missing; this makes the use of the data sources difficult. Some works have addressed the problem of automatic schema discovery but their scalability and their use in a big data context remain a challenge. In this work, we address this scalability issue, which is mainly related to the clustering algorithms at the core of schema discovery. In order to process large amounts of data, we propose to build a condensed representation of the initial dataset by extracting patterns representing all the existing combinations of properties. The clustering is then performed on the patterns instead of the initial dataset. In this paper, we describe our approach, and present its implementation using a big data technology. We also present some experimental evaluations performed on real datasets.
引用
收藏
页码:84 / 89
页数:6
相关论文
共 50 条
  • [1] Schema Discovery in RDF Data Sources
    Kellou-Menouer, Kenza
    Kedad, Zoubida
    CONCEPTUAL MODELING, ER 2015, 2015, 9381 : 481 - 495
  • [2] Time - Space trade-offs in scaling up RDF schema reasoning
    Stuckenschmidt, H
    Broekstra, J
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2005 WORKSHOPS, PROCEEDINGS, 2005, 3807 : 172 - 181
  • [3] Incremental Schema Discovery at Scale for RDF Data
    Bouhamoum, Redouane
    Kedad, Zoubida
    Lopes, Stephane
    SEMANTIC WEB, ESWC 2021, 2021, 12731 : 195 - 211
  • [4] The index organizations for RDF and RDF schema
    Kim, Y
    Kim, B
    Lim, H
    8TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY, VOLS 1-3: TOWARD THE ERA OF UBIQUITOUS NETWORKS AND SOCIETIES, 2006, : U1871 - U1874
  • [5] RDFind: Scalable Conditional Inclusion Dependency Discovery in RDF Datasets
    Kruse, Sebastian
    Jentzsch, Anja
    Papenbrock, Thorsten
    Kaoudi, Zoi
    Quiane-Ruiz, Jorge-Arnulfo
    Naumann, Felix
    SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 953 - 967
  • [6] Keyword search over schema-less RDF datasets by SPARQL query compilation
    Izquierdo, Yenier T.
    Garcia, Grettel M.
    Menendez, Elisa
    Leme, Luiz Andre P. P.
    Neves, Angelo
    Lemos, Melissa
    Finamore, Anna Carolina
    Oliveira, Carlos
    Casanova, Marco A.
    INFORMATION SYSTEMS, 2021, 102
  • [7] Scaling up discovery
    Seonghwan Kim
    Charles M. Schroeder
    Nature Synthesis, 2024, 3 : 562 - 564
  • [8] Scaling up discovery
    Kim, Seonghwan
    Schroeder, Charles M.
    NATURE SYNTHESIS, 2024, 3 (05): : 562 - 564
  • [9] The path index for query processing on RDF and RDF schema
    Kim, YH
    Kim, BG
    Lee, J
    Lim, HC
    7TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY, VOLS 1 AND 2, PROCEEDINGS, 2005, : 1237 - 1240
  • [10] A proposal for management of RDF and RDF Schema metadata in MOF
    dos Santos, HL
    de Barros, RSM
    Fonseca, D
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2003: COOPIS, DOA, AND ODBASE, 2003, 2888 : 1014 - 1031