The Bounded Data Reuse Problem in Scientific Workflows

被引:6
作者
Zohrevandi, Mohsen [1 ]
Bazzi, Rida A. [1 ]
机构
[1] Arizona State Univ, Sch Comp Informat & Decis Syst Engn, Tempe, AZ 85287 USA
来源
IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013) | 2013年
关键词
Scientific Workflows; Intermediate Data; Data Reuse; Series-Parallel;
D O I
10.1109/IPDPS.2013.71
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Large datasets and time-consuming processes have become the norm in scientific computing applications. The exploration phase in the development of scientific workflows involves trial-and-error with workflow components, which can take a lot of time given the time-consuming nature of the workflow tasks. These facts suggest the possibility of reducing the development time by reusing intermediate data whenever possible. However the storage space is always limited. This introduces a problem: which intermediate datasets from one workflow should be kept to be reused in another workflow, with a limited amount of storage. For the general class of series parallel graphs, we model this problem using a non-linear integer programming formulation and show that it is NP-Hard. We provide a branch and bound optimal algorithm as well as efficient heuristics. We conducted experiments over a large set of randomly-generated workflows as well as a smaller set of synthetic workflows which are based on real-world workflows used by scientists in different disciplines. Our experiments show that the best solution produced by the heuristics only differs from the optimal value by less than 1% on average.
引用
收藏
页码:1051 / 1062
页数:12
相关论文
共 50 条
  • [21] Measuring the impact of burst buffers on data-intensive scientific workflows
    da Silva, Rafael Ferreira
    Callaghan, Scott
    Tu Mai Anh Do
    Papadimitriou, George
    Deelman, Ewa
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 101 : 208 - 220
  • [22] From monitoring data to experiment information - Monitoring of grid scientific workflows
    Balis, Bartosz
    Bubak, Marian
    Pelczar, Michal
    E-SCIENCE 2007: THIRD IEEE INTERNATIONAL CONFERENCE ON E-SCIENCE AND GRID COMPUTING, PROCEEDINGS, 2007, : 77 - +
  • [23] Integration of heterogeneous scientific data using workflows - A case study in bioinformatics
    Vouk, MA
    ITI 2003: PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2003, : 25 - 28
  • [24] Data reduction in scientific workflows using provenance monitoring and user steering
    Souza, Renan
    Silva, Vitor
    Coutinho, Alvaro L. G. A.
    Valduriez, Patrick
    Mattoso, Marta
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 110 (110): : 481 - 501
  • [25] Monitoring of Grid scientific workflows
    Balis, Bartosz
    Bubak, Marian
    Labno, Bartlomiej
    SCIENTIFIC PROGRAMMING, 2008, 16 (2-3) : 205 - 216
  • [26] Reproducibility Analysis of Scientific Workflows
    Banati, Anna
    Kacsuk, Peter
    Kozlovszky, Miklos
    ACTA POLYTECHNICA HUNGARICA, 2017, 14 (02) : 201 - 217
  • [27] Characterizing and profiling scientific workflows
    Juve, Gideon
    Chervenak, Ann
    Deelman, Ewa
    Bharathi, Shishir
    Mehta, Gaurang
    Vahi, Karan
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2013, 29 (03): : 682 - 692
  • [28] Examining the challenges of scientific workflows
    Gil, Yolanda
    Deelman, Ewa
    Ellisman, Mark
    Fahringer, Thomas
    Fox, Geoffrey
    Gannon, Dennis
    Goble, Carole
    Livny, Miron
    Moreau, Luc
    Myers, Jim
    COMPUTER, 2007, 40 (12) : 24 - +
  • [29] Protecting scientific workflows in clouds with an intrusion tolerant system
    Wang, Yawen
    Guo, Yunfei
    Guo, Zehua
    Liu, Wenyan
    Yang, Chao
    IET INFORMATION SECURITY, 2020, 14 (02) : 157 - 165
  • [30] The Planck/LFI data processing: real-time analysis, data management and scientific workflows
    Frailis, M.
    Zacchei, A.
    Maris, M.
    Morisset, N.
    Rohlfs, R.
    Meharga, M.
    Binko, P.
    Turler, M.
    Galeotta, S.
    Lowe, S. R.
    Maino, D.
    Maggio, G.
    Pasian, F.
    Perrotta, F.
    Sandri, M.
    Ensslin, T.
    Reinecke, M.
    Knoche, J.
    Rachen, J.
    Hovest, W.
    Giardino, G.
    Bremer, M.
    ASTROPARTICLE, PARTICLE AND SPACE PHYSICS, DETECTORS AND MEDICAL PHYSICS APPLICATIONS, 2010, 5 : 709 - 718