A systematic review of provenance systems

被引:40
作者
Perez, Beatriz [1 ]
Rubio, Julio [1 ]
Saenz-Adan, Carlos [1 ]
机构
[1] Univ La Rioja, Dept Math & Comp Sci, La Rioja 26004, Spain
关键词
Provenance systems; Provenance aspects; Computer science; Systematic review; SCIENTIFIC WORKFLOWS; AUTOMATIC CAPTURE; TAVERNA; LINEAGE; MANAGEMENT; FRAMEWORK; MODEL; TOOL; WEB;
D O I
10.1007/s10115-018-1164-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Provenance refers to the entire amount of information, comprising all the elements and their relationships, that contribute to the existence of a piece of data. The knowledge of provenance data allows a great number of benefits such as verifying a product, result reproductivity, sharing and reuse of knowledge, or assessing data quality and validity. With such tangible benefits, it is no wonder that in recent years, research on provenance has grown exponentially, and has been applied to a wide range of different scientific disciplines. Some years ago, managing and recording provenance information were performed manually. Given the huge volume of information available nowadays, the manual performance of such tasks is no longer an option. The problem of systematically performing tasks such as the understanding, capture and management of provenance has gained significant attention by the research community and industry over the past decades. As a consequence, there has been a huge amount of contributions and proposed provenance systems as solutions for performing such kinds of tasks. The overall objective of this paper is to plot the landscape of published systems in the field of provenance, with two main purposes. First, we seek to evaluate the desired characteristics that provenance systems are expected to have. Second, we aim at identifying a set of representative systems (both early and recent use) to be exhaustively analyzed according to such characteristics. In particular, we have performed a systematic literature review of studies, identifying a comprehensive set of 105 relevant resources in all. The results show that there are common aspects or characteristics of provenance systems thoroughly renowned throughout the literature on the topic. Based on these results, we have defined a six-dimensional taxonomy of provenance characteristics attending to: general aspects, data capture, data access, subject, storage, and non-functional aspects. Additionally, the study has found that there are 25 most referenced provenance systems within the provenance context. This study exhaustively analyzes and compares such systems attending to our taxonomy and pinpoints future directions.
引用
收藏
页码:495 / 543
页数:49
相关论文
共 134 条
[1]  
Agrawal P., 2006, VLDB, P1151
[2]   Tioga-2: A direct manipulation database visualization environment [J].
Aiken, A ;
Chen, J ;
Stonebraker, M ;
Woodruff, A .
PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, 1996, :208-217
[3]   Static analysis of Taverna workflows to predict provenance patterns [J].
Alper, Pinar ;
Belhajjame, Khalid ;
Goble, Carole A. .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 75 :310-329
[4]  
Altintas I, 2006, LECT NOTES COMPUT SC, V4145, P118
[5]   Putting Lipstick on Pig: Enabling Database-style Workflow Provenance [J].
Amsterdamer, Yael ;
Davidson, Susan B. ;
Deutch, Daniel ;
Milo, Tova ;
Stoyanovich, Julia ;
Tannen, Val .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 5 (04) :346-357
[6]  
[Anonymous], 2011, Proc. of Fifth Biennial Conference on Innovative Data Systems Research (CIDR)
[7]  
[Anonymous], P TAPP 10
[8]  
[Anonymous], P TAPP 13
[9]  
[Anonymous], 2010, WEB SERVICES RES EME
[10]  
[Anonymous], NAT C ART INT VANC B