Provenance in collection-oriented scientific workflows

被引:29
作者
Bowers, Shawn [1 ]
McPhillips, Timothy M. [1 ]
Ludascher, Bertram [1 ,2 ]
机构
[1] Univ Calif Davis, Genome Ctr, Davis, CA 95616 USA
[2] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA
关键词
provenance; collection-oriented scientific workflows; scientific data management;
D O I
10.1002/cpe.1226
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We describe a provenance model tailored to scientific workflows based on the collection-oriented modeling and design paradigm. Our implementation within the Kepler scientific workflow system captures the dependencies of data and collection creation events on preexisting data and collections, and embeds these provenance records within the data stream. A provenance query engine operates on self-contained workflow traces representing serializations of the output data stream for particular workflow runs. We demonstrate this approach in our response to the first provenance challenge. Copyright (C) 2007 John Wiley & Sons, Ltd.
引用
收藏
页码:519 / 529
页数:11
相关论文
共 11 条
  • [1] BOWERS S, 2006, LECT NOTES COMPUTER, V4145
  • [2] CHIEN SY, 2001, P INT C VER LARG DAT
  • [3] *DEP EECS, 2006, PTOL 2 PROJ SYST
  • [4] LUDASCHER B, 2007, CONCURRENCY COMPUTAT
  • [5] Scientific workflow management and the Kepler system
    Ludascher, Bertram
    Altintas, Ilkay
    Berkley, Chad
    Higgins, Dan
    Jaeger, Efrat
    Jones, Matthew
    Lee, Edward A.
    Tao, Jing
    Zhao, Yang
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2006, 18 (10) : 1039 - 1065
  • [6] McPhillips T, 2006, LECT NOTES COMPUT SC, V4075, P248
  • [8] MILES S, 2007, CONCURRENCY COMPUTAT
  • [9] SCHEIDEGGER C, 2007, CONCURRENCY COMPUTAT
  • [10] SCHUCHARDT K, 2007, CONCURRENCY COMPUTAT