Putting Lipstick on Pig: Enabling Database-style Workflow Provenance

被引:61
作者
Amsterdamer, Yael [2 ]
Davidson, Susan B. [1 ]
Deutch, Daniel [3 ]
Milo, Tova [2 ]
Stoyanovich, Julia [1 ]
Tannen, Val [1 ]
机构
[1] Univ Penn, Philadelphia, PA 19104 USA
[2] Tel Aviv Univ, Tel Aviv, Israel
[3] Ben Gurion Univ Negev, Beer Sheva, Israel
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2011年 / 5卷 / 04期
基金
以色列科学基金会; 美国国家科学基金会;
关键词
D O I
10.14778/2095686.2095693
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Workflow provenance typically assumes that each module is a "black-box", so that each output depends on all inputs (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between repeated executions. In practice, however, an output may depend on only a small subset of the inputs (finegrained dependencies) as well as on the internal state of the module. We present a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies. A critical ingredient in our solution is the use of a novel form of provenance graph that models module invocations and yields a compact representation of fine-grained workflow provenance. It also enables a number of novel graph transformation operations, allowing to choose the desired level of granularity in provenance querying (ZoomIn and ZoomOut), and supporting "what-if" workflow analytic queries. We implemented our approach in the Lipstick system and developed a benchmark in support of a systematic performance evaluation. Our results demonstrate the feasibility of tracking and querying fine-grained workflow provenance.
引用
收藏
页码:346 / 357
页数:12
相关论文
共 29 条
[1]  
Acar U.A., 2010, TAPP
[2]  
Amsterdamer Y., 2011, PODS
[3]  
Benjelloun O., 2008, VLDB J, V17
[4]  
Biton O., 2007, VLDB
[5]  
Bowers S., 2008, CONCURRENCY COMPUTAT, V20
[6]   PRINCIPLES OF PROGRAMMING WITH COMPLEX OBJECTS AND COLLECTION TYPES [J].
BUNEMAN, P ;
NAQVI, S ;
TANNEN, V ;
WONG, LS .
THEORETICAL COMPUTER SCIENCE, 1995, 149 (01) :3-48
[7]  
Buneman P., 2008, ACM TODS, V33
[8]  
Buneman P., 2001, ICDT
[9]   Provenance in Databases: Why, How, and Where [J].
Cheney, James ;
Chiticariu, Laura ;
Tan, Wang-Chiew .
FOUNDATIONS AND TRENDS IN DATABASES, 2007, 1 (04) :379-474
[10]  
Davidson S. B., 2007, IEEE DATA ENG B, V30