Scaling Up Schema Discovery for RDF Datasets

被引:5
|
作者
Bouhamoum, Redouane [1 ]
Kellou-Menouer, Kenza [1 ]
Kedad, Zoubida [1 ]
Lopes, Sephane [1 ]
机构
[1] DAVID Univ Versailles St Quentin En Yvelines, Versailles, France
来源
2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW) | 2018年
关键词
D O I
10.1109/ICDEW.2018.00021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An increasing number of data sources is published on the Web, expressed using the languages proposed by the W3C such as RDF. In these sources, data is not constrained by a schema: data could differ from the schema-related statements provided in the source; furthermore, the schema could be incomplete or even missing; this makes the use of the data sources difficult. Some works have addressed the problem of automatic schema discovery but their scalability and their use in a big data context remain a challenge. In this work, we address this scalability issue, which is mainly related to the clustering algorithms at the core of schema discovery. In order to process large amounts of data, we propose to build a condensed representation of the initial dataset by extracting patterns representing all the existing combinations of properties. The clustering is then performed on the patterns instead of the initial dataset. In this paper, we describe our approach, and present its implementation using a big data technology. We also present some experimental evaluations performed on real datasets.
引用
收藏
页码:84 / 89
页数:6
相关论文
共 50 条
  • [21] Scaling up disease model discovery
    Dustin M. Graham
    Lab Animal, 2017, 46 : 334 - 334
  • [22] Scaling up disease model discovery
    Graham, Dustin M.
    LAB ANIMAL, 2017, 46 (09) : 334 - 334
  • [23] Design of storage structure for path-based query on RDF and RDF schema
    Kim, YounHee
    Kim, ByungGon
    Lim, HaeChull
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2006, E89A (06): : 1733 - 1735
  • [24] Co-evolution of RDF Datasets
    Faisal, Sidra
    Endris, Kemele M.
    Shekarpour, Saeedeh
    Auer, Soeren
    Vidal, Maria-Esther
    WEB ENGINEERING (ICWE 2016), 2016, 9671 : 225 - 243
  • [25] Provenance Management for Evolving RDF Datasets
    Avgoustaki, Argyro
    Flouris, Giorgos
    Fundulaki, Irini
    Plexousakis, Dimitris
    SEMANTIC WEB: LATEST ADVANCES AND NEW DOMAINS, 2016, 9678 : 575 - 592
  • [26] Revealing the Conceptual Schemas of RDF Datasets
    Issa, Subhi
    Paris, Pierre-Henri
    Hamdi, Faycal
    Cherfi, Samira Si-Said
    ADVANCED INFORMATION SYSTEMS ENGINEERING (CAISE 2019), 2019, 11483 : 312 - 327
  • [27] OFR: An Efficient Representation of RDF Datasets
    Swacha, Jakub
    Grabowski, Szymon
    LANGUAGES, APPLICATIONS AND TECHNOLOGIES, SLATE 2015, 2015, 563 : 224 - 235
  • [28] An Approved Signature Index of RDF Datasets
    Zhu, Yuanchao
    Wu, Jibing
    Huang, Hongbin
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL AND INFORMATION SCIENCES (ICCIS 2014), 2014, : 930 - 936
  • [29] HDTcrypt: Compression and encryption of RDF datasets
    Fernandez, Javier D.
    Kirrane, Sabrina
    Polleres, Axel
    Steyskal, Simon
    SEMANTIC WEB, 2020, 11 (02) : 337 - 359
  • [30] A Peer-to-Peer Information Sharing Method for RDF Triples Based on RDF Schema
    Kohigashi, Kohichi
    Takahashi, Kentaro
    Harumoto, Kaname
    Nishio, Shojiro
    DISTRIBUTED COMPUTING, ARTIFICIAL INTELLIGENCE, BIOINFORMATICS, SOFT COMPUTING, AND AMBIENT ASSISTED LIVING, PT II, PROCEEDINGS, 2009, 5518 : 646 - +