Scalable long-term preservation of relational data through SPARQL queries

被引:0
作者
Stefanova, Silvia [1 ]
Risch, Tore [1 ]
机构
[1] Uppsala Univ, Dept Informat Technol, Box 337, SE-75105 Uppsala, Sweden
关键词
Database archival; benchmark; SPARQL optimization; SPARQL views of relational databases; unbound-property queries;
D O I
10.3233/SW-150173
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an approach for scalable long-term preservation of data stored in relational databases (RDBs) as RDF, implemented in the SAQ (Semantic Archive and Query) system. The proposed approach is suitable for archiving scientific data used in scientific publications where it is desirable to preserve only parts of an RDB, e.g. only data about a specific set of experimental artefacts in the database. With the approach, long-term preservation as RDF of selected parts of a database is specified as an archival query in an extended SPARQL dialect, A-SPARQL. The query processing is based on automatically generating an RDF view of a relational database to archive, called the RD-view. A-SPARQL provides flexible selection of data to be archived in terms of a SPARQL-like query to the RD-view. The result of an archival query is a data archive file containing the RDF-triples representing the relational data content to be preserved. The system also generates a schema archive file where sufficient meta-data are saved to allow the archived database to be fully reconstructed. An archival query usually selects both properties and their values for sets of subjects, which makes the property p in some triple patterns unknown. We call such queries where properties are unknown unbound-property queries. To achieve scalable data preservation and recreation, we propose some query transformation strategies suitable for optimizing unbound-property queries. These query rewriting strategies were implemented and evaluated in a new benchmark for archival queries called ABench. ABench is defined as set of typical A-SPARQL queries archiving selected parts of databases generated by the Berlin benchmark data generator. In experiments, the SAQ optimization strategies were evaluated by measuring the performance of A-SPARQL queries selecting triples for archival in ABench. The performance of equivalent SPARQL queries for related systems was also measured. The results showed that the proposed optimizations substantially improve the query execution time for archival queries.
引用
收藏
页码:117 / 137
页数:21
相关论文
共 37 条
[21]  
LITWIN W, 1992, IEEE T KNOWLEDGE DAT, V4
[22]  
Masanes J., 2006, Web Archiving, P1, DOI DOI 10.1007/978-3-540-46332-0_
[23]  
Muralikrishna M., 1988, Proceedings of the Seventh ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, P263, DOI 10.1145/308386.308452
[24]  
Neto L.E.T., D2RQ ACCESSING RELAT
[25]  
OAIS Reference model for an open archival information system (OAIS), 2012, REF MOD OP ARCH INF
[26]  
Petrini J., 2008, THESIS UPPSALA U
[27]  
Petrini J., 2004, P 1 INT WORKSH WRAPP, P16
[28]  
Ramalho J.C., 2007, P INT WORKSH MARK OV
[29]  
Sequeda J.F., 2011, 10 INT SEM WEB C ISW
[30]  
Sequeda J. F., 2012, P 21 WORLD WIDE WEB, P649, DOI DOI 10.1145/2187836.2187924