Scheduling Distributed I/O Resources in HPC Systems

被引:0
作者
Bandet, Alexis [1 ]
Boito, Francieli [1 ]
Pallez, Guillaume [2 ]
机构
[1] Univ Bordeaux, CNRS, Bordeaux INP, Inria,LaBRI,UMR 5800, F-33400 Talence, France
[2] INRIA, Rennes, France
来源
EURO-PAR 2024: PARALLEL PROCESSING, PT I, EURO-PAR 2024 | 2024年 / 14801卷
关键词
HPC; parallel I/O; parallel file system; object storage targets; I/O forwarding; scheduling; resource allocation;
D O I
10.1007/978-3-031-69577-3_10
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a comprehensive investigation on optimizing I/O performance in the access to distributed I/O resources in high-performance computing (HPC) environments. I/O resources, such as the I/O forwarding nodes and object storage targets (OST), are shared by applications. Each application has access to a subset of them, and multiple applications can access the same resources. We propose heuristics to schedule these distributed I/O resources in two steps: for each application, determining how many (allocation) and which (placement) resources to use. We discuss a wide range of information about applications' characteristics that can be used by the scheduling algorithms. Despite the fact that a higher level of application knowledge is associated with better performance, we demonstrate the robustness of our solutions in scenarios where information is limited or inaccurate. This research provides insights into the trade-offs between the depth of application characterization and the practicality of scheduling I/O resources.
引用
收藏
页码:137 / 151
页数:15
相关论文
共 23 条
[1]  
Almási G, 2003, LECT NOTES COMPUT SC, V2790, P543
[2]   Sizing and Partitioning Strategies for Burst-Buffers to Reduce TO Contention [J].
Aupy, Guillaume ;
Beaumont, Olivier ;
Eyraud-Dubois, Lionel .
2019 IEEE 33RD INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2019), 2019, :631-640
[3]   Access Patterns and Performance Behaviors of Multi-layer Supercomputer I/O Subsystems under Production Load [J].
Bez, Jean Luca ;
Karimi, Ahmad Maroof ;
Paul, Arnab K. ;
Xie, Bing ;
Byna, Suren ;
Carns, Philip ;
Oral, Sarp ;
Wang, Feiyi ;
Hanley, Jesse .
PROCEEDINGS OF THE 31ST INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2022, 2022, :43-55
[4]   Towards On-Demand I/O Forwarding in HPC Platforms [J].
Bez, Jean Luca ;
Boito, Francieli Z. ;
Miranda, Alberto ;
Nou, Ramon ;
Cortes, Toni ;
Navaux, Philippe O. A. .
PROCEEDINGS OF 2020 IEEE/ACM FIFTH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP (PDSW 2020), 2020, :7-14
[5]   Interference-Aware Scheduling Using Geometric Constraints [J].
Bleuse, Raphael ;
Dogeas, Konstantinos ;
Lucarelli, Giorgio ;
Mounie, Gregory ;
Trystram, Denis .
EURO-PAR 2018: PARALLEL PROCESSING, 2018, 11014 :205-217
[6]   IO-SETS: Simple and Efficient Approaches for I/O Bandwidth Management [J].
Boito, Francieli ;
Pallez, Guillaume ;
Teylo, Luan ;
Vidal, Nicolas .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (10) :2783-2796
[7]   The role of storage target allocation in applications' I/O performance with BeeGFS [J].
Boito, Francieli ;
Pallez, Guillaume ;
Teylo, Luan .
2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, :267-277
[8]   Understanding and Improving Computational Science Storage Access through Continuous Characterization [J].
Carns, Philip ;
Harms, Kevin ;
Allcock, William ;
Bacon, Charles ;
Lang, Samuel ;
Latham, Robert ;
Ross, Robert .
ACM TRANSACTIONS ON STORAGE, 2011, 7 (03)
[9]  
Carns P, 2009, 2009 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING AND WORKSHOPS, P516
[10]   I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning [J].
Chowdhury, Fahim ;
Zhu, Yue ;
Heer, Todd ;
Paredes, Saul ;
Moody, Adam ;
Goldstone, Robin ;
Mohror, Kathryn ;
Yu, Weikuan .
PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,