RDFPROV: A relational RDF store for querying and managing scientific workflow provenance

被引:26
作者
Chebotko, Artem [1 ]
Lu, Shiyong [2 ]
Fei, Xubo [2 ]
Fotouhi, Farshad [2 ]
机构
[1] Univ Texas Pan Amer, Dept Comp Sci, Edinburg, TX 78539 USA
[2] Wayne State Univ, Dept Comp Sci, Detroit, MI 48202 USA
关键词
Provenance; Scientific workflow; Metadata management; Ontology; RDF; OWL; SPARQL-to-SQL translation; Query optimization; RDF store; RDBMS; SEMANTIC WEB; AUTOMATIC CAPTURE; LINEAGE; MANAGEMENT;
D O I
10.1016/j.datak.2010.03.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Provenance metadata has become increasingly important to support scientific discovery reproducibility, result interpretation, and problem diagnosis in scientific workflow environments. The provenance management problem concerns the efficiency and effectiveness of the modeling, recording, representation, integration, storage, and querying of provenance metadata. Our approach to provenance management seamlessly integrates the interoperability, extensibility, and inference advantages of Semantic Web technologies with the storage and querying power of an RDBMS to meet the emerging requirements of scientific workflow provenance management. In this paper, we elaborate on the design of a relational RDF store, called RDFPROV, which is optimized for scientific workflow provenance querying and management. Specifically, we propose: i) two schema mapping algorithms to map an OWL provenance ontology to a relational database schema that is optimized for common provenance queries; ii) three efficient data mapping algorithms to map provenance RDF metadata to relational data according to the generated relational database schema, and iii) a schema-independent SPARQL-to-SQL translation algorithm that is optimized on-the-fly by using the type information of an instance available from the input provenance ontology and the statistics of the sizes of the tables in the database. Experimental results are presented to show that our algorithms are efficient and scalable. The comparison with two popular relational RDF stores, Jena and Sesame, and two commercial native ROE stores, AllegroGraph and BigOWLIM, showed that our optimizations result in improved performance and scalability for provenance metadata management. Finally, our case study for provenance management in a real-life biological simulation workflow showed the production quality and capability of the RDFPROV system. Although presented in the context of scientific workflow provenance management, many of our proposed techniques apply to general RDF data management as well. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:836 / 865
页数:30
相关论文
共 114 条
[1]  
Abadi Daniel J., 2007, Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB'07, P411
[2]  
Agrawal R., 2001, Proceedings of the 27th International Conference on Very Large Data Bases, P149
[3]  
Alexaki S., 2001, 4th International Workshop on the Web and Databases, P43
[4]  
Altintas I, 2006, LECT NOTES COMPUT SC, V4145, P118
[5]  
[Anonymous], 2006, Proc. Special Interest Group on Management of Data Conf. (SIGMOD '06), DOI [10.1145/1142473.1142574, DOI 10.1145/1142473.1142574]
[6]  
[Anonymous], 2005, Digital Media Systems Laboratory HP Laboratories Bristol. HPL-2005-170, 35, 9
[7]   Proof explanation for a nonmonotonic Semantic Web rules language [J].
Antoniou, Grigoris ;
Bikakis, Antonis ;
Dimaresis, Nikos ;
Genetzakis, Manolis ;
Georgalis, Giannis ;
Governatori, Guido ;
Karouzaki, Efie ;
Kazepis, Nikolas ;
Kosmadakis, Dimitris ;
Kritsotakis, Manolis ;
Lilis, Giannis ;
Papadogiannakis, Antonis ;
Pediaditis, Panagiotis ;
Terzakis, Constantinos ;
Theodosaki, Rena ;
Zeginis, Dimitris .
DATA & KNOWLEDGE ENGINEERING, 2008, 64 (03) :662-687
[8]  
Anyanwu K., 2007, P 16 INT C WORLD WID, P797, DOI DOI 10.1145/1242572.1242680
[9]  
BANCILHON F, 1986, P ACM SIGMOD INT C M, P16
[10]   Automatic capture and efficient storage of e-Science experiment provenance [J].
Barga, Roger S. ;
Digiampietri, Luciano A. .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2008, 20 (05) :419-429