Scaling Up Schema Discovery for RDF Datasets

被引:5
作者
Bouhamoum, Redouane [1 ]
Kellou-Menouer, Kenza [1 ]
Kedad, Zoubida [1 ]
Lopes, Sephane [1 ]
机构
[1] DAVID Univ Versailles St Quentin En Yvelines, Versailles, France
来源
2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW) | 2018年
关键词
D O I
10.1109/ICDEW.2018.00021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An increasing number of data sources is published on the Web, expressed using the languages proposed by the W3C such as RDF. In these sources, data is not constrained by a schema: data could differ from the schema-related statements provided in the source; furthermore, the schema could be incomplete or even missing; this makes the use of the data sources difficult. Some works have addressed the problem of automatic schema discovery but their scalability and their use in a big data context remain a challenge. In this work, we address this scalability issue, which is mainly related to the clustering algorithms at the core of schema discovery. In order to process large amounts of data, we propose to build a condensed representation of the initial dataset by extracting patterns representing all the existing combinations of properties. The clustering is then performed on the patterns instead of the initial dataset. In this paper, we describe our approach, and present its implementation using a big data technology. We also present some experimental evaluations performed on real datasets.
引用
收藏
页码:84 / 89
页数:6
相关论文
共 50 条
  • [41] RDF schema based ubiquitous Healthcare service composition
    Lee, W
    Sohn, MM
    Kim, JH
    Ha, BH
    Kang, SH
    E-COMMERCE AND WEB TECHNOLOGIES, PROCEEDINGS, 2005, 3590 : 208 - 217
  • [42] Faceted exploration of RDF/S datasets: a survey
    Tzitzikas, Yannis
    Manolis, Nikos
    Papadakos, Panagiotis
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2017, 48 (02) : 329 - 364
  • [43] Theme-Based Summarization for RDF Datasets
    Rihany, Mohamad
    Kedad, Zoubida
    Lopes, Stephane
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2020, PT II, 2020, 12392 : 312 - 321
  • [44] A Comparison of Two Strategies for Scaling Up Instance Selection in Huge Datasets
    de Haro-Garcia, Aida
    Perez-Rodriguez, Javier
    Garcia-Pedrajas, Nicolas
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 7023 : 64 - 73
  • [45] Evaluating the Gap Between an RDF Dataset and Its Schema
    Kellou-Menouer, Kenza
    Kedad, Zoubida
    ADVANCES IN CONCEPTUAL MODELING, ER 2015 WORKSHOPS, 2015, 9382 : 283 - 292
  • [46] Sorted Neighborhood for Schema-Free RDF Data
    Kejriwal, Mayank
    Miranker, Daniel P.
    SEMANTIC WEB: ESWC 2015 SATELLITE EVENTS, 2015, 9341 : 217 - 229
  • [47] A survey of RDF management technologies and benchmark datasets
    Zhengyu Pan
    Tao Zhu
    Hong Liu
    Huansheng Ning
    Journal of Ambient Intelligence and Humanized Computing, 2018, 9 : 1693 - 1704
  • [48] Enabling knowledge representation on the Web by extending RDF Schema
    Broekstra, J
    Klein, M
    Decker, S
    Fensel, D
    van Harmelen, F
    Horrocks, I
    COMPUTER NETWORKS, 2002, 39 (05) : 609 - 634
  • [49] A survey of RDF management technologies and benchmark datasets
    Pan, Zhengyu
    Zhu, Tao
    Liu, Hong
    Ning, Huansheng
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2018, 9 (05) : 1693 - 1704
  • [50] HDTQ: Managing RDF Datasets in Compressed Space
    Fernandez, Javier D.
    Martinez-Prieto, Miguel A.
    Polleres, Axel
    Reindorf, Julian
    SEMANTIC WEB (ESWC 2018), 2018, 10843 : 191 - 208