Tracking provenance in a virtual data grid

被引:32
作者
Clifford, Ben [1 ]
Foster, Ian [1 ,2 ]
Voeckler, Jens-S. [3 ]
Wilder, Michael [1 ,2 ]
Zhao, Yong [1 ]
机构
[1] Univ Chicago, Computat Inst, Chicago, IL 60637 USA
[2] Argonne Natl Lab, Div Math & Comp Sci, Argonne, IL 60439 USA
[3] USC Informat Sci Inst, Marina Del Rey, CA USA
关键词
grid computing; workflow; data provenance;
D O I
10.1002/cpe.1256
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The virtual data model allows data sets to be described prior to, and separately from, their physical materialization. We have implemented this model in a Virtual Data Language (VDL) and associated supporting tools, which provide for both the storage, query, and retrieval of virtual data set descriptions, and the automated, on-demand materialization of virtual data sets. We use a standardized data provenance challenge exercise to illustrate the powerful queries that can be performed on the data maintained by these tools, which for a single virtual data set can include three elements: the computational procedure(s) that must be executed to materialize the data set, the runtime log(s) produced by the execution of the computation(s), and optional metadata annotation(s) that associate application semantics with data and procedures. Copyright (C) 2007 John Wiley & Sons, Ltd.
引用
收藏
页码:565 / 575
页数:11
相关论文
共 12 条
  • [1] AVERY P, 2001, GRIPHYN PROJECT PETA
  • [2] DEELMAN E, 2004, 2 EU GRIDS C NIC
  • [3] Globus: A metacomputing infrastructure toolkit
    Foster, I
    Kesselman, C
    [J]. INTERNATIONAL JOURNAL OF SUPERCOMPUTER APPLICATIONS AND HIGH PERFORMANCE COMPUTING, 1997, 11 (02): : 115 - 128
  • [4] FOSTER I, 2002, 14 C SCI STAT DAT MA
  • [5] Frew James, 2007, CONCURRENCY COMPUTAT
  • [6] *IPAW, 2006, IPAW 2006 PROV CHALL
  • [7] MILES S, 2007, CONCURRENCY COMPUTAT
  • [8] SCHEIDEGGER C, 2007, CONCURRENCY COMPUTAT
  • [9] SELTZER M, 2007, CONCURRENCY COMPUTAT
  • [10] Voeckler J.S., 2006, 2 INT WORKSH GRID CO