Argo NodeOS: Toward Unified Resource Management for Exascale

被引:7
|
作者
Perarnau, Swann [1 ]
Zounmevo, Judicael A. [1 ]
Dreher, Matthieu [1 ]
Van Essen, Brian C. [3 ]
Gioiosa, Roberto [2 ]
Iskra, Kamil [1 ]
Gokhale, Maya B. [3 ]
Yoshii, Kazutomo [1 ]
Beckman, Pete [1 ]
机构
[1] Argonne Natl Lab, Argonne, IL 60439 USA
[2] Pacific Northwest Natl Lab, Richland, WA 99352 USA
[3] Lawrence Livermore Natl Lab, Livermore, CA USA
来源
2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) | 2017年
基金
美国国家科学基金会;
关键词
D O I
10.1109/IPDPS.2017.25
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Exascale systems are expected to feature hundreds of thousands of compute nodes with hundreds of hardware threads and complex memory hierarchies with a mix of on-package and persistent memory modules. In this context, the Argo project is developing a new operating system for exascale machines. Targeting production workloads using workflows or coupled codes, we improve the Linux kernel on several fronts. We extend the memory management of Linux to be able to subdivide NUMA memory nodes, allowing better resource partitioning among processes running on the same node. We also add support for memory-mapped access to node-local, PCIe-attached NVRAM devices and introduce a new scheduling class targeted at parallel runtimes supporting user-level load balancing. These features are unified into compute containers, a containerization approach focused on providing modern HPC applications with dynamic control over a wide range of kernel interfaces. To keep our approach compatible with industrial containerization products, we also identify contentions points for the adoption of containers in HPC settings. Each NodeOS feature is evaluated by using a set of parallel benchmarks, miniapps, and coupled applications consisting of simulation and data analysis components, running on a modern NUMA platform. We observe out-of-the-box performance improvements easily matching, and often exceeding, those observed with expert-optimized configurations on standard OS kernels. Our lightweight approach to resource management retains the many benefits of a full OS kernel that application programmers have learned to depend on, at the same time providing a set of extensions that can be freely mixed and matched to best benefit particular application components.
引用
收藏
页码:153 / 162
页数:10
相关论文
共 50 条
  • [31] Research on Unified Resource Management and Scheduling System in Cloud Environment
    Hua Jiang
    Yanli Xiao
    Wireless Personal Communications, 2018, 102 : 963 - 973
  • [32] TOWARDS EXASCALE DISTRIBUTED DATA MANAGEMENT
    Aloisio, Giovanni
    Fiore, Sandro
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2009, 23 (04): : 398 - 400
  • [33] Power Management Technology for Exascale Computing
    Gao J.-G.
    Gong D.-Y.
    Wu W.
    Zheng Y.
    Zhu Q.
    Wang F.
    Zheng F.
    Jin L.-F.
    Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (07): : 1373 - 1383
  • [34] Scientific Grand Challenges: Toward Exascale Supercomputing and Beyond
    Getov, Vladimir
    COMPUTER, 2015, 48 (11) : 12 - 14
  • [35] PMIx: Process management for exascale environments
    Castain, Ralph H.
    Hursey, Joshua
    Bouteiller, Aurelien
    Solt, David
    PARALLEL COMPUTING, 2018, 79 : 9 - 29
  • [36] Systemwide Power Management with Argo
    Ellsworth, Daniel
    Patki, Tapasya
    Perarnau, Swann
    Seo, Sangmin
    Amer, Abdelhalim
    Zounmevo, Judicael
    Gupta, Rinku
    Yoshii, Kazutomo
    Hoffman, Henry
    Malony, Allen
    Schulz, Martin
    Beckman, Pete
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 1118 - 1121
  • [37] Quantum ESPRESSO: One Further Step toward the Exascale
    Carnimeo, Ivan
    Affinito, Fabio
    Baroni, Stefano
    Baseggio, Oscar
    Bellentani, Laura
    Bertossa, Riccardo
    Delugas, Pietro Davide
    Ruffino, Fabrizio Ferrari
    Orlandini, Sergio
    Spiga, Filippo
    Giannozzi, Paolo
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2023, 19 (20) : 6992 - 7006
  • [38] Task Scheduling Frameworks for Heterogeneous Computing Toward Exascale
    Sandokji, Suhelah
    Eassa, Fathy
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (10) : 234 - 243
  • [39] Toward a Self-aware System for Exascale Architectures
    Landwehr, Aaron
    Zuckerman, Stephane
    Gao, Guang R.
    EURO-PAR 2013: PARALLEL PROCESSING WORKSHOPS, 2014, 8374 : 812 - 822
  • [40] Alya toward exascale: algorithmic scalability using PSCToolkit
    Owen, Herbert
    Lehmkuhl, Oriol
    D'Ambra, Pasqua
    Durastante, Fabio
    Filippone, Salvatore
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (10): : 13533 - 13556