Challenges of Provenance in Scientific Workflow Management Systems

被引:2
作者
Alam, Khairul [1 ]
Roy, Banani [1 ]
机构
[1] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK, Canada
来源
2022 IEEE/ACM WORKSHOP ON WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE, WORKS | 2022年
基金
加拿大自然科学与工程研究理事会;
关键词
Scientific workflow; scientific workflow management system; provenance; reusability; open science; SEMANTIC WEB; E-SCIENCE; AUTOMATIC CAPTURE;
D O I
10.1109/WORKS56498.2022.00007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific workflow is one of the well-established pillars of large-scale computational science and emerged as a torchbearer to formalize and structure a massive amount of complex heterogeneous data and accelerate scientific progress. A workflow can analyze terabyte-scale datasets, contain numerous individual tasks, and coordinate between heterogeneous tasks with the help of scientific workflow management systems (SWfMSs). SWfMSs support the automation of repetitive tasks and capture complex analysis through workflows. However, the execution of workflows is costly and requires a lot of resource usage. At different phases of a workflow life cycle, most SWfMSs store provenance information, allowing result reproducibility, sharing, and knowledge reuse in the scientific community. But, this provenance information can be many times larger than the workflow and input data, and managing provenance data is growing in complexity with large-scale applications. Handling exponential increasing data volume and utilizing the technical resources for storage and computing are thus demanded by exploiting data-intensive computing in various application fields. This paper documented the challenges of provenance management and reuse in e-science, focusing primarily on scientific workflow approaches by exploring different SWfMSs and provenance management systems. We also investigated the ways to overcome the challenges.
引用
收藏
页码:10 / 18
页数:9
相关论文
共 99 条
[1]   LIGO - THE LASER-INTERFEROMETER-GRAVITATIONAL-WAVE-OBSERVATORY [J].
ABRAMOVICI, A ;
ALTHOUSE, WE ;
DREVER, RWP ;
GURSEL, Y ;
KAWAMURA, S ;
RAAB, FJ ;
SHOEMAKER, D ;
SIEVERS, L ;
SPERO, RE ;
THORNE, KS ;
VOGT, RE ;
WEISS, R ;
WHITCOMB, SE ;
ZUCKER, ME .
SCIENCE, 1992, 256 (5055) :325-333
[2]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[3]   ExaWorks: Workflows for Exascale [J].
Al-Saadi, Aymen ;
Ahn, Dong H. ;
Babuji, Yadu ;
Chard, Kyle ;
Corbett, James ;
Hategan, Mihael ;
Herbein, Stephen ;
Jha, Shantenu ;
Laney, Daniel ;
Merzky, Andre ;
Munson, Todd ;
Salim, Michael ;
Titov, Mikhail ;
Turilli, Matteo ;
Uram, Thomas D. ;
Wozniak, Justin M. .
PROCEEDINGS OF 16TH WORKSHOP ON WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE (WORKS21), 2021, :50-57
[4]  
Altintas I, 2004, 16TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, P423
[5]  
Altintas I, 2006, LECT NOTES COMPUT SC, V4145, P118
[6]  
Anand M. K. S., 2010, Advances in Database Technology - EDBT 2010 - 13th International Conference on Extending Database Technology, Proceedings, May 2014, V10, P287, DOI DOI 10.1145/1739041.1739078
[7]  
[Anonymous], 2007, Workflows for e-Science, DOI [DOI 10.1007/978-1-84628-757-2, DOI 10.1007/978-1-84628-757-220]
[8]  
[Anonymous], 2013, CALT DAT
[9]  
[Anonymous], 2012, 2012 IEEE 8 E SCI
[10]  
[Anonymous], 2013, EDBT ICDT WORKSH, DOI DOI 10.1145/2457317.2457365