Argo NodeOS: Toward Unified Resource Management for Exascale

被引:7
|
作者
Perarnau, Swann [1 ]
Zounmevo, Judicael A. [1 ]
Dreher, Matthieu [1 ]
Van Essen, Brian C. [3 ]
Gioiosa, Roberto [2 ]
Iskra, Kamil [1 ]
Gokhale, Maya B. [3 ]
Yoshii, Kazutomo [1 ]
Beckman, Pete [1 ]
机构
[1] Argonne Natl Lab, Argonne, IL 60439 USA
[2] Pacific Northwest Natl Lab, Richland, WA 99352 USA
[3] Lawrence Livermore Natl Lab, Livermore, CA USA
来源
2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) | 2017年
基金
美国国家科学基金会;
关键词
D O I
10.1109/IPDPS.2017.25
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Exascale systems are expected to feature hundreds of thousands of compute nodes with hundreds of hardware threads and complex memory hierarchies with a mix of on-package and persistent memory modules. In this context, the Argo project is developing a new operating system for exascale machines. Targeting production workloads using workflows or coupled codes, we improve the Linux kernel on several fronts. We extend the memory management of Linux to be able to subdivide NUMA memory nodes, allowing better resource partitioning among processes running on the same node. We also add support for memory-mapped access to node-local, PCIe-attached NVRAM devices and introduce a new scheduling class targeted at parallel runtimes supporting user-level load balancing. These features are unified into compute containers, a containerization approach focused on providing modern HPC applications with dynamic control over a wide range of kernel interfaces. To keep our approach compatible with industrial containerization products, we also identify contentions points for the adoption of containers in HPC settings. Each NodeOS feature is evaluated by using a set of parallel benchmarks, miniapps, and coupled applications consisting of simulation and data analysis components, running on a modern NUMA platform. We observe out-of-the-box performance improvements easily matching, and often exceeding, those observed with expert-optimized configurations on standard OS kernels. Our lightweight approach to resource management retains the many benefits of a full OS kernel that application programmers have learned to depend on, at the same time providing a set of extensions that can be freely mixed and matched to best benefit particular application components.
引用
收藏
页码:153 / 162
页数:10
相关论文
共 50 条
  • [21] EDRM: A Unified Approach for Enterprise Data Resource Management
    Chen Weiwen
    Ma Shilong
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (01): : 119 - 126
  • [22] Alya: Multiphysics engineering simulation toward exascale
    Vazquez, Mariano
    Houzeaux, Guillaume
    Koric, Seid
    Artigues, Antoni
    Aguado-Sierra, Jazmin
    Aris, Ruth
    Mira, Daniel
    Calmet, Hadrien
    Cucchietti, Fernando
    Owen, Herbert
    Taha, Ahmed
    Burness, Evan Dering
    Maria Cela, Jose
    Valero, Mateo
    JOURNAL OF COMPUTATIONAL SCIENCE, 2016, 14 : 15 - 27
  • [23] Big Iron Moves Toward Exascale Computing
    Leavitt, Neal
    COMPUTER, 2012, 45 (11) : 14 - 17
  • [24] Toward Transparent Optical Networking in Exascale Computers
    Rumley, Sebastien
    Calhoun, David M.
    Rodrigues, Arun
    Hammond, Simon
    Bergman, Keren
    ECOC 2015 41ST EUROPEAN CONFERENCE ON OPTICAL COMMUNICATION, 2015,
  • [25] Evolving MPI plus X Toward Exascale
    Bader, David A.
    COMPUTER, 2016, 49 (08) : 10 - 10
  • [26] Toward exascale design of soft mesoscale materials
    Succi, Sauro
    Amati, Giorgio
    Bonaccorso, Fabio
    Lauricella, Marco
    Bernaschi, M.
    Montessori, Andrea
    Tiribocchi, Adriano
    JOURNAL OF COMPUTATIONAL SCIENCE, 2020, 46
  • [27] A unified program resource management scheme of personal video recorder
    Lu, W
    Yu, S
    Chu, JH
    2004 IEEE INTERNATIONAL SYMPOSIUM ON CONSUMER ELECTRONICS, PROCEEDINGS, 2004, : 351 - 352
  • [28] E-Resource Migration: From Dual to Unified Management
    Wickes, Abigail
    SERIALS REVIEW, 2021, 47 (3-4) : 140 - 142
  • [29] IBM zEnterprise Unified Resource Manager platform performance management
    Yocom, P.
    Shah, H.
    Hulber, M. F.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2012, 56 (1-2)
  • [30] Research on Unified Resource Management and Scheduling System in Cloud Environment
    Jiang, Hua
    Xiao, Yanli
    WIRELESS PERSONAL COMMUNICATIONS, 2018, 102 (02) : 963 - 973