The Bounded Data Reuse Problem in Scientific Workflows

被引:6
|
作者
Zohrevandi, Mohsen [1 ]
Bazzi, Rida A. [1 ]
机构
[1] Arizona State Univ, Sch Comp Informat & Decis Syst Engn, Tempe, AZ 85287 USA
来源
IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013) | 2013年
关键词
Scientific Workflows; Intermediate Data; Data Reuse; Series-Parallel;
D O I
10.1109/IPDPS.2013.71
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Large datasets and time-consuming processes have become the norm in scientific computing applications. The exploration phase in the development of scientific workflows involves trial-and-error with workflow components, which can take a lot of time given the time-consuming nature of the workflow tasks. These facts suggest the possibility of reducing the development time by reusing intermediate data whenever possible. However the storage space is always limited. This introduces a problem: which intermediate datasets from one workflow should be kept to be reused in another workflow, with a limited amount of storage. For the general class of series parallel graphs, we model this problem using a non-linear integer programming formulation and show that it is NP-Hard. We provide a branch and bound optimal algorithm as well as efficient heuristics. We conducted experiments over a large set of randomly-generated workflows as well as a smaller set of synthetic workflows which are based on real-world workflows used by scientists in different disciplines. Our experiments show that the best solution produced by the heuristics only differs from the optimal value by less than 1% on average.
引用
收藏
页码:1051 / 1062
页数:12
相关论文
共 50 条
  • [1] Experiment Line: Software Reuse in Scientific Workflows
    Ogasawara, Eduardo
    Paulino, Carlos
    Murta, Leonardo
    Werner, Claudia
    Mattoso, Marta
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2009, 5566 : 264 - +
  • [2] Network Analysis of Scientific Workflows: A Gateway to Reuse
    Tan, Wei
    Zhang, Jia
    Foster, Ian
    COMPUTER, 2010, 43 (09) : 54 - 61
  • [3] On the reuse of scientific data
    Pasquetto I.V.
    Randles B.M.
    Borgman C.L.
    Pasquetto, Irene V. (irenepasquetto@ucla.edu), 1600, Committee on Data for Science and Technology (16):
  • [4] Typetheoretic Approach to the Shimming Problem in Scientific Workflows
    Kashlev, Andrey
    Lu, Shiyong
    Chebotko, Artem
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2015, 8 (05) : 795 - 809
  • [5] Designing and Evaluating Scientific Workflows for Big Data Interactions
    Etemadpour, Ronak
    Murray, Paul
    Bomhoff, Matthew
    Lyons, Eric
    Forbes, Angus Graeme
    2015 BIG DATA VISUAL ANALYTICS (BDVA), 2015,
  • [6] MaDaTS: Managing Data on Tiered Storage for Scientific Workflows
    Ghoshal, Devarshi
    Ramakrishnan, Lavanya
    HPDC'17: PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2017, : 41 - 52
  • [7] Accelerating Scientific Workflows with Tiered Data Management System
    Cheng, Peng
    Lu, Yutong
    Du, Yunfei
    Chen, Zhiguang
    IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2018, : 75 - 82
  • [8] Securing the Intermediate Data of Scientific Workflows in Clouds With ACISO
    Wang, Yawen
    Guo, Yunfei
    Guo, Zehua
    Liu, Wenyan
    Yang, Chao
    IEEE ACCESS, 2019, 7 : 126603 - 126617
  • [9] Performance analysis and data reduction for exascale scientific workflows
    Kelly, Christopher
    Xu, Wei
    Pouchard, Line C.
    Van Dam, Hubertus
    Islam, Tanzima Z.
    Yoo, Shinjae
    Van Dam, Kerstin Kleese
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2025,
  • [10] The future of scientific workflows
    Deelman, Ewa
    Peterka, Tom
    Altintas, Ilkay
    Carothers, Christopher D.
    van Dam, Kerstin Kleese
    Moreland, Kenneth
    Parashar, Manish
    Ramakrishnan, Lavanya
    Taufer, Michela
    Vetter, Jeffrey
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2018, 32 (01) : 159 - 175