Scaling Up Schema Discovery for RDF Datasets

被引:5
作者
Bouhamoum, Redouane [1 ]
Kellou-Menouer, Kenza [1 ]
Kedad, Zoubida [1 ]
Lopes, Sephane [1 ]
机构
[1] DAVID Univ Versailles St Quentin En Yvelines, Versailles, France
来源
2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW) | 2018年
关键词
D O I
10.1109/ICDEW.2018.00021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An increasing number of data sources is published on the Web, expressed using the languages proposed by the W3C such as RDF. In these sources, data is not constrained by a schema: data could differ from the schema-related statements provided in the source; furthermore, the schema could be incomplete or even missing; this makes the use of the data sources difficult. Some works have addressed the problem of automatic schema discovery but their scalability and their use in a big data context remain a challenge. In this work, we address this scalability issue, which is mainly related to the clustering algorithms at the core of schema discovery. In order to process large amounts of data, we propose to build a condensed representation of the initial dataset by extracting patterns representing all the existing combinations of properties. The clustering is then performed on the patterns instead of the initial dataset. In this paper, we describe our approach, and present its implementation using a big data technology. We also present some experimental evaluations performed on real datasets.
引用
收藏
页码:84 / 89
页数:6
相关论文
共 50 条
  • [31] Interpreting XML documents via an RDF Schema ontology
    Klein, M
    13TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2002, : 889 - 893
  • [32] Exploiting XML Schema for Interpreting XML Documents as RDF
    Thuy, Pham Thi Thu
    Lee, Young-Koo
    Lee, Sungyoung
    Jeong, Byeong-Soo
    2008 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING, PROCEEDINGS, VOL 2, 2008, : 555 - 558
  • [33] Deriving an Emergent Relational Schema from RDF Data
    Minh-Duc Pham
    Passing, Linnea
    Erling, Orri
    Boncz, Peter
    PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW 2015), 2015, : 864 - 874
  • [34] RDF-based schema mediation for database grid
    Chen, HJ
    Wu, ZH
    Zheng, GZ
    Mao, YX
    FIFTH IEEE/ACM INTERNATIONAL WORKSHOP ON GRID COMPUTING, PROCEEDINGS, 2004, : 456 - 460
  • [35] Theoretical and experimental procedure for scaling-up RDF gasifiers: The Gibbs Gradient Method
    Barba, Diego
    Capocelli, Mauro
    Cornacchia, Giacinto
    Matera, Domenico A.
    Fuel, 2016, 179 : 60 - 70
  • [36] Theoretical and experimental procedure for scaling-up RDF gasifiers: The Gibbs Gradient Method
    Barba, Diego
    Capocelli, Mauro
    Cornacchia, Giacinto
    Matera, Domenico A.
    FUEL, 2016, 179 : 60 - 70
  • [37] A Scalable Framework for Quality Assessment of RDF Datasets
    Sejdiu, Gezim
    Rula, Anisa
    Lehmann, Jens
    Jabeen, Hajira
    SEMANTIC WEB - ISWC 2019, PT II, 2019, 11779 : 261 - 276
  • [38] Fast and Practical Snippet Generation for RDF Datasets
    Liu, Daxin
    Cheng, Gong
    Liu, Qingxia
    Qu, Yuzhong
    ACM TRANSACTIONS ON THE WEB, 2019, 13 (04)
  • [39] Materializing Inferred and Uncertain Knowledge in RDF Datasets
    McGlothlin, James P.
    Khan, Latifur
    PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 1951 - 1952
  • [40] On ranking RDF schema elements (and its application in visualization)
    Tzitzikas, Yannis
    Kotzinos, Dimitris
    Theoharis, Yannis
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2007, 13 (12) : 1854 - 1880