MaDaTS: Managing Data on Tiered Storage for Scientific Workflows

被引:13
|
作者
Ghoshal, Devarshi [1 ]
Ramakrishnan, Lavanya [1 ]
机构
[1] Lawrence Berkeley Natl Lab, 1 Cyclotron Rd, Berkeley, CA 94720 USA
来源
HPDC'17: PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING | 2017年
关键词
Data management; scientific workflows; multi-tiered storage; burst buffer;
D O I
10.1145/3078597.3078611
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Scientific workflows are increasingly used in High Performance Computing (HPC) environments to manage complex simulation and analyses, often consuming and generating large amounts of data. However, workflow tools have limited support for managing the input, output and intermediate data. The data elements of a workflow are often managed by the user through scripts or other ad-hoc mechanisms. Technology advances for future HPC systems is redefining the memory and storage subsystem by introducing additional tiers to improve the I/O performance of data-intensive applications. These architectural changes introduce additional complexities to managing data for scientific workflows. Thus, we need to manage the scientific workflow data across the tiered storage system on HPC machines. In this paper, we present the design and implementation of MaDaTS (Managing Data on Tiered Storage for Scientific Workflows), a software architecture that manages data for scientific workflows. We introduce Virtual Data Space (VDS), an abstraction of the data in a workflow that hides the complexities of the underlying storage system while allowing users to control data management strategies. We evaluate the data management strategies with real scientific and synthetic workflows, and demonstrate the capabilities of MaDaTS. Our experiments demonstrate the flexibility, performance and scalability gains of MaDaTS as compared to the traditional approach of managing data in scientific workflows.
引用
收藏
页码:41 / 52
页数:12
相关论文
共 50 条
  • [1] Programming Abstractions for Managing Workflows on Tiered Storage Systems
    Ghoshal, Devarshi
    Ramakrishnan, Lavanya
    ACM TRANSACTIONS ON STORAGE, 2021, 17 (04)
  • [2] Accelerating Scientific Workflows with Tiered Data Management System
    Cheng, Peng
    Lu, Yutong
    Du, Yunfei
    Chen, Zhiguang
    IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2018, : 75 - 82
  • [3] Managing Hot Metadata for Scientific Workflows on Multisite Clouds
    Pineda-Morales, Luis
    Liu, Ji
    Costan, Alexandru
    Pacitti, Esther
    Antoniu, Gabriel
    Valduriez, Patrick
    Mattoso, Marta
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 390 - 397
  • [4] Rethinking Data Management for Big Data Scientific Workflows
    Vahi, Karan
    Rynge, Mats
    Juve, Gideon
    Mayani, Rajiv
    Deelman, Ewa
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [5] A data placement strategy in scientific cloud workflows
    Yuan, Dong
    Yang, Yun
    Liu, Xiao
    Chen, Jinjun
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2010, 26 (08): : 1200 - 1214
  • [6] The Bounded Data Reuse Problem in Scientific Workflows
    Zohrevandi, Mohsen
    Bazzi, Rida A.
    IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 1051 - 1062
  • [7] USFD: a unified storage framework for SOAR HPC scientific workflows
    Mackey, Grant
    Sehrish, Saba
    Mitchell, Christopher
    Bent, John
    Wang, Jun
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2012, 27 (04) : 347 - 367
  • [8] Designing and Evaluating Scientific Workflows for Big Data Interactions
    Etemadpour, Ronak
    Murray, Paul
    Bomhoff, Matthew
    Lyons, Eric
    Forbes, Angus Graeme
    2015 BIG DATA VISUAL ANALYTICS (BDVA), 2015,
  • [9] Understanding the Impact of Data Staging for Coupled Scientific Workflows
    Gainaru, Ana
    Wan, Lipeng
    Wang, Ruonan
    Suchyta, Eric
    Chen, Jieyang
    Podhorszki, Norbert
    Kress, James
    Pugmire, David
    Klasky, Scott
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 4134 - 4147
  • [10] Securing the Intermediate Data of Scientific Workflows in Clouds With ACISO
    Wang, Yawen
    Guo, Yunfei
    Guo, Zehua
    Liu, Wenyan
    Yang, Chao
    IEEE ACCESS, 2019, 7 : 126603 - 126617