Polyflow: A SOA for AnalyzingWorkflow Heterogeneous Provenance Data in Distributed Environments

被引:2
作者
Mendes, Yan [1 ]
Braga, Regina [1 ]
Stroele, Victor [1 ]
de Oliveira, Daniel [2 ]
机构
[1] Univ Fed Juiz de Fora UFJF, Programa Posgrad Ciencia Computacao, Juiz De Fora, MG, Brazil
[2] Univ Fed Fluminense UFF, Inst Computacao, Niteroi, RJ, Brazil
来源
PROCEEDINGS OF THE XV BRAZILIAN SYMPOSIUM ON INFORMATION SYSTEMS, SBSI 2019: Complexity on Modern Information Systems | 2019年
关键词
Workflows interoperability; heterogeneous provenance data integration; polystore; SCIENTIFIC WORKFLOWS; BIG DATA; SCIENCE; ANALYTICS; SYSTEMS; MODEL;
D O I
10.1145/3330204.3330259
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the last decade the (big) data-driven science paradigm became a wide-spread reality. However, this approach has some limitations such as a performance dependency on the quality of the data and the lack of reproducibility of the results. In order to enable this reproducibility, many tools such as Workflow Management Systems were developed to formalize process pipelines and capture execution traces. However, interoperating data generated by these solutions became a problem, since most systems adopted proprietary data models. To support interoperability across heterogeneous provenance data, we propose a Service Oriented Architecture with a polystore storage design in which provenance is conceptually represented utilizing the ProvONE model. A wrapper layer is responsible for transforming data described by heterogeneous formats into ProvONE-compliant. Moreover, we propose a query layer that provides location and access transparency to users. Furthermore, we conduct two feasibility studies, showcasing real usecase scenarios. Firstly, we illustrate how two research groups can compare their processes and results. Secondly, we show how our architecture can be used as a queriable provenance repository. We show Polyflow's viability for both scenarios using the Goal-Question-Metric methodology. Finally, we show our solution usability and extensibility appeal by comparing it to similar approaches.
引用
收藏
页数:8
相关论文
共 48 条
[1]  
Abbasi A, 2016, J ASSOC INF SYST, V17, pI
[2]  
Altintas I, 2004, 16TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, P423
[3]  
[Anonymous], 2010, WORKS 2010
[4]  
[Anonymous], 2013, PROV-Overview
[5]  
Araujo Renata, 2016, GranDSI-BR, VI, P42
[6]   Scientific workflows: Past, present and future [J].
Atkinson, Malcolm ;
Gesing, Sandra ;
Montagnat, Johan ;
Taylor, Ian .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 75 :216-227
[7]  
Basili V. R., 1994, Encyclopedia of Software Engineering, V1, P528
[8]  
Begoli E, 2016, 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), P2550, DOI 10.1109/BigData.2016.7840896
[9]  
Buneman P, 2001, LECT NOTES COMPUT SC, V1973, P316
[10]  
Chen HC, 2012, MIS QUART, V36, P1165