MaDaTS: Managing Data on Tiered Storage for Scientific Workflows

被引:13
|
作者
Ghoshal, Devarshi [1 ]
Ramakrishnan, Lavanya [1 ]
机构
[1] Lawrence Berkeley Natl Lab, 1 Cyclotron Rd, Berkeley, CA 94720 USA
来源
HPDC'17: PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING | 2017年
关键词
Data management; scientific workflows; multi-tiered storage; burst buffer;
D O I
10.1145/3078597.3078611
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Scientific workflows are increasingly used in High Performance Computing (HPC) environments to manage complex simulation and analyses, often consuming and generating large amounts of data. However, workflow tools have limited support for managing the input, output and intermediate data. The data elements of a workflow are often managed by the user through scripts or other ad-hoc mechanisms. Technology advances for future HPC systems is redefining the memory and storage subsystem by introducing additional tiers to improve the I/O performance of data-intensive applications. These architectural changes introduce additional complexities to managing data for scientific workflows. Thus, we need to manage the scientific workflow data across the tiered storage system on HPC machines. In this paper, we present the design and implementation of MaDaTS (Managing Data on Tiered Storage for Scientific Workflows), a software architecture that manages data for scientific workflows. We introduce Virtual Data Space (VDS), an abstraction of the data in a workflow that hides the complexities of the underlying storage system while allowing users to control data management strategies. We evaluate the data management strategies with real scientific and synthetic workflows, and demonstrate the capabilities of MaDaTS. Our experiments demonstrate the flexibility, performance and scalability gains of MaDaTS as compared to the traditional approach of managing data in scientific workflows.
引用
收藏
页码:41 / 52
页数:12
相关论文
共 50 条
  • [41] End-to-end online performance data capture and analysis for scientific workflows
    Papadimitriou, George
    Wang, Cong
    Vahi, Karan
    da Silva, Rafael Ferreira
    Mandal, Anirban
    Liu, Zhengchun
    Mayani, Rajiv
    Rynge, Mats
    Kiran, Mariam
    Lynch, Vickie E.
    Kettimuthu, Rajkumar
    Deelman, Ewa
    Vetter, Jeffrey S.
    Foster, Ian
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 117 : 387 - 400
  • [42] Development of complex scientific workflows: towards end-to-end workflows
    Penton, D. J.
    Freebairn, A.
    Bridgart, R.
    Murray, N.
    Smith, T.
    20TH INTERNATIONAL CONGRESS ON MODELLING AND SIMULATION (MODSIM2013), 2013, : 900 - 906
  • [43] The role of machine learning in scientific workflows
    Deelman, Ewa
    Mandal, Anirban
    Jiang, Ming
    Sakellariou, Rizos
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2019, 33 (06) : 1128 - 1139
  • [44] Distributed Management of Scientific Workflows in SWIMS
    El-Gayyar, M.
    Leng, Y.
    Cremers, A.
    PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES 2010), 2010, : 327 - 331
  • [45] Adaptive exception handling for scientific workflows
    Tolosana-Calasanz, Rafael
    Banares, Jose A.
    Rana, Omer F.
    Alvarez, Pedro
    Ezpeleta, Joaquin
    Hoheisel, Andreas
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2010, 22 (05) : 617 - 642
  • [46] Autonomic streaming pipeline for scientific workflows
    Tolosana-Calasanz, Rafael
    Banares, Jose A.
    Rana, Omer F.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (16) : 1868 - 1892
  • [47] Acquiring Adaptation Cases for Scientific Workflows
    Minor, Mirjam
    Goerg, Sebastian
    CASE-BASED REASONING RESEARCH AND DEVELOPMENT, ICCBR 2011, 2011, 6880 : 166 - 180
  • [48] Scientific workflows: Past, present and future
    Atkinson, Malcolm
    Gesing, Sandra
    Montagnat, Johan
    Taylor, Ian
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 75 : 216 - 227
  • [49] Information flow analysis of scientific workflows
    Yang, Ping
    Lu, Shiyong
    Gofman, Mikhail I.
    Yang, Zijiang
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2010, 76 (06) : 390 - 402
  • [50] Toward Prioritization of Data Flows for Scientific Workflows Using Virtual Software Defined Exchanges
    Mandal, Anirban
    Ruth, Paul
    Baldin, Ilya
    da Silva, Rafael Ferreira
    Deelman, Ewa
    2017 IEEE 13TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE), 2017, : 566 - 575