MaDaTS: Managing Data on Tiered Storage for Scientific Workflows

被引:13
|
作者
Ghoshal, Devarshi [1 ]
Ramakrishnan, Lavanya [1 ]
机构
[1] Lawrence Berkeley Natl Lab, 1 Cyclotron Rd, Berkeley, CA 94720 USA
来源
HPDC'17: PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING | 2017年
关键词
Data management; scientific workflows; multi-tiered storage; burst buffer;
D O I
10.1145/3078597.3078611
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Scientific workflows are increasingly used in High Performance Computing (HPC) environments to manage complex simulation and analyses, often consuming and generating large amounts of data. However, workflow tools have limited support for managing the input, output and intermediate data. The data elements of a workflow are often managed by the user through scripts or other ad-hoc mechanisms. Technology advances for future HPC systems is redefining the memory and storage subsystem by introducing additional tiers to improve the I/O performance of data-intensive applications. These architectural changes introduce additional complexities to managing data for scientific workflows. Thus, we need to manage the scientific workflow data across the tiered storage system on HPC machines. In this paper, we present the design and implementation of MaDaTS (Managing Data on Tiered Storage for Scientific Workflows), a software architecture that manages data for scientific workflows. We introduce Virtual Data Space (VDS), an abstraction of the data in a workflow that hides the complexities of the underlying storage system while allowing users to control data management strategies. We evaluate the data management strategies with real scientific and synthetic workflows, and demonstrate the capabilities of MaDaTS. Our experiments demonstrate the flexibility, performance and scalability gains of MaDaTS as compared to the traditional approach of managing data in scientific workflows.
引用
收藏
页码:41 / 52
页数:12
相关论文
共 50 条
  • [31] Measuring the impact of burst buffers on data-intensive scientific workflows
    da Silva, Rafael Ferreira
    Callaghan, Scott
    Tu Mai Anh Do
    Papadimitriou, George
    Deelman, Ewa
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 101 : 208 - 220
  • [32] Hypermedia workflow: a new approach to data-driven scientific workflows
    Balis, Bartosz
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 100 - 107
  • [33] Monitoring of Grid scientific workflows
    Balis, Bartosz
    Bubak, Marian
    Labno, Bartlomiej
    SCIENTIFIC PROGRAMMING, 2008, 16 (2-3) : 205 - 216
  • [34] Reproducibility Analysis of Scientific Workflows
    Banati, Anna
    Kacsuk, Peter
    Kozlovszky, Miklos
    ACTA POLYTECHNICA HUNGARICA, 2017, 14 (02) : 201 - 217
  • [35] QoS Support for Scientific Workflows using Software-Defined Storage Resource Enclaves
    Karki, Suman
    Nguyen, Bao
    Zhang, Xuechen
    2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 95 - 104
  • [36] Examining the challenges of scientific workflows
    Gil, Yolanda
    Deelman, Ewa
    Ellisman, Mark
    Fahringer, Thomas
    Fox, Geoffrey
    Gannon, Dennis
    Goble, Carole
    Livny, Miron
    Moreau, Luc
    Myers, Jim
    COMPUTER, 2007, 40 (12) : 24 - +
  • [37] Characterizing and profiling scientific workflows
    Juve, Gideon
    Chervenak, Ann
    Deelman, Ewa
    Bharathi, Shishir
    Mehta, Gaurang
    Vahi, Karan
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2013, 29 (03): : 682 - 692
  • [38] AiiDAlab - an ecosystem for developing, executing, and sharing scientific workflows
    Yakutovich, Aliaksandr, V
    Eimre, Kristjan
    Schutt, Ole
    Talirz, Leopold
    Adorf, Carl S.
    Andersen, Casper W.
    Ditler, Edward
    Du, Dou
    Passerone, Daniele
    Smit, Berend
    Marzari, Nicola
    Pizzi, Giovanni
    Pignedoli, Carlo A.
    COMPUTATIONAL MATERIALS SCIENCE, 2021, 188
  • [39] The Planck/LFI data processing: real-time analysis, data management and scientific workflows
    Frailis, M.
    Zacchei, A.
    Maris, M.
    Morisset, N.
    Rohlfs, R.
    Meharga, M.
    Binko, P.
    Turler, M.
    Galeotta, S.
    Lowe, S. R.
    Maino, D.
    Maggio, G.
    Pasian, F.
    Perrotta, F.
    Sandri, M.
    Ensslin, T.
    Reinecke, M.
    Knoche, J.
    Rachen, J.
    Hovest, W.
    Giardino, G.
    Bremer, M.
    ASTROPARTICLE, PARTICLE AND SPACE PHYSICS, DETECTORS AND MEDICAL PHYSICS APPLICATIONS, 2010, 5 : 709 - 718
  • [40] Towards autonomic data management for staging-based coupled scientific workflows
    Jin T.
    Zhang F.
    Sun Q.
    Romanus M.
    Bui H.
    Parashar M.
    Journal of Parallel and Distributed Computing, 2020, 146 : 35 - 51