Managing Data Provenance in Genome Project Workflows

被引:0
作者
de Paula, Renato [1 ]
Holanda, Maristela T. [1 ]
Walter, Maria Emilia M. T. [1 ]
Lifschitz, Sergio [2 ]
机构
[1] Univ Brasilia UnB, Dept Comp Sci, Brasilia, DF, Brazil
[2] Pontificial Cathol Univ, Dept Informat, Rio De Janeiro, Brazil
来源
2012 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW) | 2012年
关键词
data provenance; PROV-DM; genome project; workflow; bioinformatics; MODEL;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this article, we propose the application of the PROV-DM model to manage data provenance for workflows designed to support genome projects. This provenance model aims at storing details of each execution of the workflow, which include raw and produced data, computational tools and versions, parameters, and so on. This way, biologists can review details of a particular workflow execution, compare information generated among different executions, and plan new ones more efficiently. In addition, we have created a provenance simulator to facilitate the inclusion of a provenance data model in genome projects. In order to validate our proposal, we discuss a case study of an RNA-Seq project that aims to identify, measure and compare RNA expression levels across liver and kidney RNA samples produced by high-throughput automatic sequencers.
引用
收藏
页数:8
相关论文
共 24 条
[1]  
[Anonymous], SIGMOD REC
[2]  
[Anonymous], 19 ACM INT S HIGH PE
[3]  
[Anonymous], MICR ESCIENCE WORKSH
[4]  
[Anonymous], PROVENANCE BASED AUD
[5]   Whole-genome re-sequencing [J].
Bentley, David R. .
CURRENT OPINION IN GENETICS & DEVELOPMENT, 2006, 16 (06) :545-552
[6]  
Bowers S, 2006, LECT NOTES COMPUT SC, V4145, P133
[7]  
Buneman P, 2001, LECT NOTES COMPUT SC, V1973, P316
[8]  
de Paula Renato, 2012, Proceedings of the ISCA 4th International Conference on Bioinformatics and Computational Biology 2012, P165
[9]  
Ellson J, 2004, MATH VIS, P127
[10]  
Glavic B., 2007, DATENBANKSYSTEME BUS, p[227, 1]