ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems

被引:31
作者
Byna, Suren [1 ]
Breitenfeld, M. Scot [2 ]
Dong, Bin [1 ]
Koziol, Quincey [1 ]
Pourmal, Elena [2 ]
Robinson, Dana [2 ]
Soumagne, Jerome [2 ]
Tang, Houjun [1 ]
Vishwanath, Venkatram [3 ]
Warren, Richard [2 ]
机构
[1] Lawrence Berkeley Natl Lab, Berkeley, CA 94597 USA
[2] HDF Grp, Champaign, IL 61820 USA
[3] Argonne Natl Lab, Lemont, IL 60439 USA
关键词
parallel I; O; Hierarchical Data Format version 5 (HDF5); I; O performance; virtual object layer; HDF5; optimizations;
D O I
10.1007/s11390-020-9822-9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific applications at exascale generate and analyze massive amounts of data. A critical requirement of these applications is the capability to access and manage this data efficiently on exascale systems. Parallel I/O, the key technology enables moving data between compute nodes and storage, faces monumental challenges from new applications, memory, and storage architectures considered in the designs of exascale systems. As the storage hierarchy is expanding to include node-local persistent memory, burst buffers, etc., as well as disk-based storage, data movement among these layers must be efficient. Parallel I/O libraries of the future should be capable of handling file sizes of many terabytes and beyond. In this paper, we describe new capabilities we have developed in Hierarchical Data Format version 5 (HDF5), the most popular parallel I/O library for scientific applications. HDF5 is one of the most used libraries at the leadership computing facilities for performing parallel I/O on existing HPC systems. The state-of-the-art features we describe include: Virtual Object Layer (VOL), Data Elevator, asynchronous I/O, full-featured single-writer and multiple-reader (Full SWMR), and parallel querying. In this paper, we introduce these features, their implementations, and the performance and feature benefits to applications and other libraries.
引用
收藏
页码:145 / 160
页数:16
相关论文
共 11 条
  • [1] [Anonymous], 2011, P EDBT ICDT 2011 WOR
  • [2] [Anonymous], 2009, COMPUTATIONAL SCI DI
  • [3] [Anonymous], P 23 IEEE INT S PAR
  • [4] Bin Dong, 2016, INT C HIGH PERFORM, P152, DOI [10.1109/HiPC.2016.32, 10.1109/HiPC.2016.026]
  • [5] Byna S, 2012, INT CONF HIGH PERFOR
  • [6] Dong B, 2018, IEEE INT CONF BIG DA, P211, DOI 10.1109/BigData.2018.8622616
  • [7] Dong JW, 2017, 2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION, CYBERNETICS AND COMPUTATIONAL SOCIAL SYSTEMS (ICCSS), P53, DOI 10.1109/ICCSS.2017.8091383
  • [8] Li Jianwei, 2003, SC 03, P39, DOI [DOI 10.1109/SC.2003.10053, 10.1145/1048935.1050189]
  • [9] Racah E, 2017, ADV NEUR IN, V30
  • [10] Argobots: A Lightweight Low-Level Threading and Tasking Framework
    Seo, Sangmin
    Amer, Abdelhalim
    Balaji, Pavan
    Bordage, Cyril
    Bosilca, George
    Brooks, Alex
    Carns, Philip
    Castello, Adrian
    Genet, Damien
    Herault, Thomas
    Iwasaki, Shintaro
    Jindal, Prateek
    Kale, Laxmikant V.
    Krishnamoorthy, Sriram
    Lifflander, Jonathan
    Lu, Huiwei
    Meneses, Esteban
    Snir, Marc
    Sun, Yanhua
    Taura, Kenjiro
    Beckman, Pete
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (03) : 512 - 526