MaDaTS: Managing Data on Tiered Storage for Scientific Workflows

被引:13
|
作者
Ghoshal, Devarshi [1 ]
Ramakrishnan, Lavanya [1 ]
机构
[1] Lawrence Berkeley Natl Lab, 1 Cyclotron Rd, Berkeley, CA 94720 USA
来源
HPDC'17: PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING | 2017年
关键词
Data management; scientific workflows; multi-tiered storage; burst buffer;
D O I
10.1145/3078597.3078611
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Scientific workflows are increasingly used in High Performance Computing (HPC) environments to manage complex simulation and analyses, often consuming and generating large amounts of data. However, workflow tools have limited support for managing the input, output and intermediate data. The data elements of a workflow are often managed by the user through scripts or other ad-hoc mechanisms. Technology advances for future HPC systems is redefining the memory and storage subsystem by introducing additional tiers to improve the I/O performance of data-intensive applications. These architectural changes introduce additional complexities to managing data for scientific workflows. Thus, we need to manage the scientific workflow data across the tiered storage system on HPC machines. In this paper, we present the design and implementation of MaDaTS (Managing Data on Tiered Storage for Scientific Workflows), a software architecture that manages data for scientific workflows. We introduce Virtual Data Space (VDS), an abstraction of the data in a workflow that hides the complexities of the underlying storage system while allowing users to control data management strategies. We evaluate the data management strategies with real scientific and synthetic workflows, and demonstrate the capabilities of MaDaTS. Our experiments demonstrate the flexibility, performance and scalability gains of MaDaTS as compared to the traditional approach of managing data in scientific workflows.
引用
收藏
页码:41 / 52
页数:12
相关论文
共 50 条
  • [21] Scientific workflows for bibliometrics
    Guler, Arzu Tugce
    Waaijer, Cathelijn J. F.
    Palmblad, Magnus
    SCIENTOMETRICS, 2016, 107 (02) : 385 - 398
  • [22] A Community Roadmap for Scientific Workflows Research and Development
    da Silva, Rafael Ferreira
    Casanova, Henri
    Chard, Kyle
    Altintas, Ilkay
    Badia, Rosa M.
    Balis, Bartosz
    Coleman, Taina
    Coppens, Frederik
    Di Natale, Frank
    Enders, Bjoern
    Fahringer, Thomas
    Filgueira, Rosa
    Fursin, Grigori
    Garijo, Daniel
    Goble, Carole
    Howell, Dorran
    Jha, Shantenu
    Katz, Daniel S.
    Laney, Daniel
    Leser, Ulf
    Malawski, Maciej
    Mehta, Kshitij
    Pottier, Loic
    Ozik, Jonathan
    Peterson, J. Luc
    Ramakrishnan, Lavanya
    Soiland-Reyes, Stian
    Thain, Douglas
    Wolf, Matthew
    PROCEEDINGS OF 16TH WORKSHOP ON WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE (WORKS21), 2021, : 81 - 90
  • [23] Multi-objective scheduling of extreme data scientific workflows in Fog
    De Maio, Vincenzo
    Kimovski, Dragi
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 106 : 171 - 184
  • [24] Processing and Managing Scientific Data in SOA Environment
    Shishedjiev, Bogdan
    Goranova, Mariana
    Georgieva, Juliana
    Gancheva, Veska
    AIC '09: PROCEEDINGS OF THE 9TH WSEAS INTERNATIONAL CONFERENCE ON APPLIED INFORMATICS AND COMMUNICATIONS: RECENT ADVANCES IN APPLIED INFORMAT AND COMMUNICATIONS, 2009, : 25 - +
  • [25] FAIR data pipeline: provenance-driven data management for traceable scientific workflows
    Mitchell, Sonia Natalie
    Lahiff, Andrew
    Cummings, Nathan
    Hollocombe, Jonathan
    Boskamp, Bram
    Field, Ryan
    Reddyhoff, Dennis
    Zarebski, Kristian
    Wilson, Antony
    Viola, Bruno
    Burke, Martin
    Archibald, Blair
    Bessell, Paul
    Blackwell, Richard
    Boden, Lisa A. A.
    Brett, Alys
    Brett, Sam
    Dundas, Ruth
    Enright, Jessica
    Gonzalez-Beltran, Alejandra N. N.
    Harris, Claire
    Hinder, Ian
    Hughes, Christopher David
    Knight, Martin
    Mano, Vino
    McMonagle, Ciaran
    Mellor, Dominic
    Mohr, Sibylle
    Marion, Glenn
    Matthews, Louise
    McKendrick, Iain J. J.
    Pooley, Christopher Mark
    Porphyre, Thibaud
    Reeves, Aaron
    Townsend, Edward
    Turner, Robert
    Walton, Jeremy
    Reeve, Richard
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2022, 380 (2233):
  • [26] From monitoring data to experiment information - Monitoring of grid scientific workflows
    Balis, Bartosz
    Bubak, Marian
    Pelczar, Michal
    E-SCIENCE 2007: THIRD IEEE INTERNATIONAL CONFERENCE ON E-SCIENCE AND GRID COMPUTING, PROCEEDINGS, 2007, : 77 - +
  • [27] On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
    da Silva, Rafael Ferreira
    Callaghan, Scott
    Deelman, Ewa
    PROCEEDINGS OF WORKS 2017: 12TH WORKSHOP ON WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE, 2017,
  • [28] Data reduction in scientific workflows using provenance monitoring and user steering
    Souza, Renan
    Silva, Vitor
    Coutinho, Alvaro L. G. A.
    Valduriez, Patrick
    Mattoso, Marta
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 110 (110): : 481 - 501
  • [29] Integration of heterogeneous scientific data using workflows - A case study in bioinformatics
    Vouk, MA
    ITI 2003: PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2003, : 25 - 28
  • [30] Managing large volumes of distributed scientific data
    Johnston, Steven
    Fangohr, Hans
    Cox, Simon J.
    COMPUTATIONAL SCIENCE - ICCS 2008, PT 3, 2008, 5103 : 339 - 348