Scalable long-term preservation of relational data through SPARQL queries

被引：0

作者：

Stefanova, Silvia ^{[1
]}

Risch, Tore ^{[1
]}

机构：

[1] Uppsala Univ, Dept Informat Technol, Box 337, SE-75105 Uppsala, Sweden

来源：

SEMANTIC WEB | 2016年 / 7卷 / 02期

关键词：

Database archival; benchmark; SPARQL optimization; SPARQL views of relational databases; unbound-property queries;

D O I：

10.3233/SW-150173

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present an approach for scalable long-term preservation of data stored in relational databases (RDBs) as RDF, implemented in the SAQ (Semantic Archive and Query) system. The proposed approach is suitable for archiving scientific data used in scientific publications where it is desirable to preserve only parts of an RDB, e.g. only data about a specific set of experimental artefacts in the database. With the approach, long-term preservation as RDF of selected parts of a database is specified as an archival query in an extended SPARQL dialect, A-SPARQL. The query processing is based on automatically generating an RDF view of a relational database to archive, called the RD-view. A-SPARQL provides flexible selection of data to be archived in terms of a SPARQL-like query to the RD-view. The result of an archival query is a data archive file containing the RDF-triples representing the relational data content to be preserved. The system also generates a schema archive file where sufficient meta-data are saved to allow the archived database to be fully reconstructed. An archival query usually selects both properties and their values for sets of subjects, which makes the property p in some triple patterns unknown. We call such queries where properties are unknown unbound-property queries. To achieve scalable data preservation and recreation, we propose some query transformation strategies suitable for optimizing unbound-property queries. These query rewriting strategies were implemented and evaluated in a new benchmark for archival queries called ABench. ABench is defined as set of typical A-SPARQL queries archiving selected parts of databases generated by the Berlin benchmark data generator. In experiments, the SAQ optimization strategies were evaluated by measuring the performance of A-SPARQL queries selecting triples for archival in ABench. The performance of equivalent SPARQL queries for related systems was also measured. The results showed that the proposed optimizations substantially improve the query execution time for archival queries.

引用

页码：117 / 137

页数：21

共 37 条

[1]

Ad hoc Strategic Committee on Information and Data, 2008, FIN REP ICSU COMM SC

[2]

[Anonymous], 2005, LONG LIV DIG DAT COL

[3]

[Anonymous], 2008, The International Journal of Digital Curation, DOI [DOI 10.2218/IJDC.V3I1.48.K, DOI 10.2218/IJDC.V3I1.48]

[4]

Arenas M., 2012, A Direct Mapping of Relational Data to RDF, V27, P1

[5]

Bizer C., 2010, BERLIN SPARQL BENCHM

[6]

Bizer C., 2011, Berlin SPARQL benchmark (BSBM

[7]

Bizer C., 2006, 5 INT SEM WEB C ISWC

[8]

Bizer C, 2004, 3 INT SEM WEB C ISWC

[9] Linked Data - The Story So Far [J].

Bizer, Christian ;

Heath, Tom ;

Berners-Lee, Tim .

INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2009, 5 (03) :1-22

[10]

Borghoff U., 2010, PRINCIPLES PRACTICES, V2010, P3

← 1 2 3 4 →