Static analysis of Taverna workflows to predict provenance patterns

被引:5
作者
Alper, Pinar [1 ]
Belhajjame, Khalid [2 ]
Goble, Carole A. [1 ]
机构
[1] Univ Manchester, Sch Comp Sci, Oxford Rd, Manchester M13 9PL, Lancs, England
[2] Univ Paris 09, Pl Marechal Lattre Tassigny, F-75016 Paris, France
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2017年 / 75卷
基金
英国工程与自然科学研究理事会;
关键词
Scientific workflows; Provenance; Annotation; Static analysis;
D O I
10.1016/j.future.2017.01.004
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Workflows have found adoption in scientific domains particularly due to their automation and provenance features. Using workflows scientists can repeat analyses with different input parameters and later use provenance to access and compare results based on these respective parameters. A common assumption is that by designing an analysis as a workflow we get parameter-to-result traceability for free by using workflow provenance. This assumption holds for cases of coarse-grained traceability where an entire workflow is subjected to repetition and all workflow parameters contribute to all results. However, this assumption is not guaranteed to hold for cases requiring finer grained traceability: where a workflow is configured with collections of parameters and analyses within a workflow are repeated with combinations of parameters from collections. In this paper we identify two dimensions that affect finegrained traceability: (1) Factorial Design, which is the level of granularity in modelling parameters/data in workflows and in provenance that is supported by a workflow system; and (2) the practice of scientists in successfully encoding Factorial Design into workflows. Taverna is a workflow system that provides extensive features for factorial design. However it also supports a free approach to workflow design which means that scientists may create workflows which could break traceability in provenance when they run. Using a real-world Taverna workflow we show how broken traceability manifests in provenance, rendering it ineffective for accessing workflow outputs derived from particular input parameters. In order to prevent broken traceability from occurring we describe a rule-based static analysis technique which operates over workflow descriptions and anticipates patterns in provenance. Our rules exploit the well-defined execution behaviour in the Taverna system. In order to understand Factorial Design support in workflow systems in general, we provide a comparative survey. We conclude that other workflow systems also provide constructs for Factorial Design, and, similar to Taverna, they too are prone to broken traceability. (C) 2017 Published by Elsevier B.V.
引用
收藏
页码:310 / 329
页数:20
相关论文
共 35 条
[1]  
Abiteboul Serge, 1995, FDN DATABASES LOGICA
[2]  
[Anonymous], 4 USENIX WORKSH THEO
[3]  
[Anonymous], TAVERNA WORKFLOW ANA
[4]  
[Anonymous], 5 INT PROV ANN WORKS
[5]  
[Anonymous], 2014, PROV TEMPLATE TEMPLA
[6]  
[Anonymous], P 13 INT C EXT DAT T
[7]  
[Anonymous], IN ACM SIGMOD
[8]  
[Anonymous], PROVENANCE ANNOTATIO
[9]  
[Anonymous], IPAW
[10]  
[Anonymous], AN INT MED IS GALAXI