Argo NodeOS: Toward Unified Resource Management for Exascale

被引:7
|
作者
Perarnau, Swann [1 ]
Zounmevo, Judicael A. [1 ]
Dreher, Matthieu [1 ]
Van Essen, Brian C. [3 ]
Gioiosa, Roberto [2 ]
Iskra, Kamil [1 ]
Gokhale, Maya B. [3 ]
Yoshii, Kazutomo [1 ]
Beckman, Pete [1 ]
机构
[1] Argonne Natl Lab, Argonne, IL 60439 USA
[2] Pacific Northwest Natl Lab, Richland, WA 99352 USA
[3] Lawrence Livermore Natl Lab, Livermore, CA USA
来源
2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) | 2017年
基金
美国国家科学基金会;
关键词
D O I
10.1109/IPDPS.2017.25
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Exascale systems are expected to feature hundreds of thousands of compute nodes with hundreds of hardware threads and complex memory hierarchies with a mix of on-package and persistent memory modules. In this context, the Argo project is developing a new operating system for exascale machines. Targeting production workloads using workflows or coupled codes, we improve the Linux kernel on several fronts. We extend the memory management of Linux to be able to subdivide NUMA memory nodes, allowing better resource partitioning among processes running on the same node. We also add support for memory-mapped access to node-local, PCIe-attached NVRAM devices and introduce a new scheduling class targeted at parallel runtimes supporting user-level load balancing. These features are unified into compute containers, a containerization approach focused on providing modern HPC applications with dynamic control over a wide range of kernel interfaces. To keep our approach compatible with industrial containerization products, we also identify contentions points for the adoption of containers in HPC settings. Each NodeOS feature is evaluated by using a set of parallel benchmarks, miniapps, and coupled applications consisting of simulation and data analysis components, running on a modern NUMA platform. We observe out-of-the-box performance improvements easily matching, and often exceeding, those observed with expert-optimized configurations on standard OS kernels. Our lightweight approach to resource management retains the many benefits of a full OS kernel that application programmers have learned to depend on, at the same time providing a set of extensions that can be freely mixed and matched to best benefit particular application components.
引用
收藏
页码:153 / 162
页数:10
相关论文
共 50 条
  • [41] Toward digital design at the exascale: An overview of project ICECap
    Peterson, J. Luc
    Bender, Tim
    Blake, Robert
    Chiang, Nai-Yuan
    Fernandez-Godino, M. Giselle
    Garcia, Bryan
    Gillette, Andrew
    Gunnarson, Brian
    Hansen, Cooper
    Hill, Judy
    Humbird, Kelli
    Kustowski, Bogdan
    Kim, Irene
    Koning, Joe
    Kur, Eugene
    Langer, Steve
    Lee, Ryan
    Lewis, Katie
    Maguire, Alister
    Milovich, Jose
    Mubarka, Yamen
    Olson, Renee
    Salmonson, Jay
    Schroeder, Chris
    Spears, Brian
    Thiagarajan, Jayaraman
    Tran, Ryan
    Wang, Jingyi
    Weber, Chris
    PHYSICS OF PLASMAS, 2024, 31 (06)
  • [42] Toward a European Exascale Ecosystem: The EuroHPC Joint Undertaking
    Skordas, Thomas
    COMMUNICATIONS OF THE ACM, 2019, 62 (04) : 70 - 73
  • [43] DOE takes another step toward exascale computing
    Kramer, David
    PHYSICS TODAY, 2015, 68 (01) : 24 - +
  • [44] AN ASSESSMENT OF VISITOR ATTITUDES TOWARD RESOURCE USE AND MANAGEMENT
    KIELYBROCATO, K
    JOURNAL OF ENVIRONMENTAL EDUCATION, 1980, 11 (04): : 29 - 36
  • [45] Trusteeship in change: Toward tribal autonomy in resource management
    Huntsinger, L
    AMERICAN INDIAN CULTURE AND RESEARCH JOURNAL, 2003, 27 (01): : 175 - 178
  • [46] From Resource to Human Being: Toward Persons Management
    Fortier, Michel
    Albert, Marie-Noelle
    SAGE OPEN, 2015, 5 (03):
  • [47] Trusteeship in change: Toward tribal autonomy in resource management
    Kerstetter, TM
    WESTERN HISTORICAL QUARTERLY, 2003, 34 (01) : 79 - 80
  • [48] UniDRM: Unified Data and Resource Management for Federated Vehicular Cloud Computing
    Danquah, Wiseborn M.
    Altilar, D. Turgay
    IEEE Access, 2021, 9 : 157052 - 157067
  • [49] Unified fault, resource management and control in ATM-based IBCN
    Sartzetakis, S
    Georgatsos, P
    Konstantoulakis, G
    Pavlou, G
    Griffin, DP
    INTEGRATED NETWORK MANAGEMENT V: INTEGRATED MANAGEMENT IN A VIRTUAL WORLD, 1997, : 262 - 274
  • [50] UniDRM: Unified Data and Resource Management for Federated Vehicular Cloud Computing
    Danquah, Wiseborn M.
    Altilar, D. Turgay
    IEEE ACCESS, 2021, 9 : 157052 - 157067