A Conceptual Framework for HPC Operational Data Analytics

被引:6
作者
Netti, Alessio [1 ]
Shin, Woong [2 ]
Ott, Michael [1 ]
Wilde, Torsten [3 ]
Bates, Natalie [4 ]
机构
[1] Leibniz Supercomp Ctr, Garching, Germany
[2] Oak Ridge Natl Lab, Oak Ridge, TN USA
[3] Hewlett Packard Enterprise, Houston, TX USA
[4] Energy Efficient HPC Working Grp, Houston, TX USA
来源
2021 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2021) | 2021年
基金
欧盟地平线“2020”;
关键词
Exascale; Top500; HPC operations; Energy efficiency; Operational data; PERFORMANCE; PREDICTION; MANAGEMENT; EFFICIENCY;
D O I
10.1109/Cluster48925.2021.00086
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper provides a broad framework for understanding trends in Operational Data Analytics (ODA) for High-Performance Computing (HPC) facilities. The goal of ODA is to allow for the continuous monitoring, archiving, and analysis of near real-time performance data, providing immediately actionable information for multiple operational uses. In this work, we combine two models to provide a comprehensive HPC ODA framework: one is an evolutionary model of analytics capabilities that consists of four types, which are descriptive, diagnostic, predictive and prescriptive, while the other is a four-pillar model for energy-efficient HPC operations that covers facility, system hardware, system software, and applications. This new framework is then overlaid with a description of current development and production deployments of ODA within leading-edge HPC facilities. Finally, we perform a comprehensive survey of ODA works and classify them according to our framework, in order to demonstrate its effectiveness.
引用
收藏
页码:596 / 603
页数:8
相关论文
共 72 条
  • [1] HPCTOOLKIT: tools for performance analysis of optimized parallel programs
    Adhianto, L.
    Banerjee, S.
    Fagan, M.
    Krentel, M.
    Marin, G.
    Mellor-Crummey, J.
    Tallent, N. R.
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2010, 22 (06) : 685 - 701
  • [2] [Anonymous], PRESCRIPTIVE ANAL BI
  • [3] [Anonymous], 2015, P EX MPI WORKSH 2015
  • [4] [Anonymous], 2007, P USENIX 2007
  • [5] Taxonomist: Application Detection Through Rich Monitoring Data
    Ates, Emre
    Tuncer, Ozan
    Turk, Ata
    Leung, Vitus J.
    Brandt, Jim
    Egele, Manuel
    Coskun, Ayse K.
    [J]. EURO-PAR 2018: PARALLEL PROCESSING, 2018, 11014 : 92 - 105
  • [6] Auweter A., 2014, Supercomputing, P394, DOI 10.1007/978-3-319-07518-1_25
  • [7] An implementation path for green information technology systems in the Ghanaian mining industry
    Bai, Chunguang
    Kusi-Sarpong, Simonov
    Sarkis, Joseph
    [J]. JOURNAL OF CLEANER PRODUCTION, 2017, 164 : 1105 - 1123
  • [8] Bautista E., 2019, PROC ICPP 2019 WORKS, P10
  • [9] BINNOR NM, 2014, PROC WICT 2014, P338
  • [10] Bockmon, 2018, FORECASTING EXTREME