A scalable framework for online power modelling of high-performance computing nodes in production

被引:3
|
作者
Pittino, Federico [1 ]
Beneventi, Francesco [1 ]
Bartolini, Andrea [1 ]
Benini, Luca [1 ,2 ]
机构
[1] Univ Bologna, Dept Elect Elect & Informat Engn DEI, Bologna, Italy
[2] Swiss Fed Inst Technol, Integrated Syst Lab, Zurich, Switzerland
来源
PROCEEDINGS 2018 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS) | 2018年
关键词
power model; HPC cluster in production; machine learning; scalable framework;
D O I
10.1109/HPCS.2018.00058
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Power and thermal design and management are critical components of high performance computing (HPC) systems, due to their cutting-edge position in terms of high power density and large total power consumption. Many HPC power management strategies rely on the availability of accurate compact power models, capable of predicting power consumption and tracking its sensitivity to workload parameters and operating points. In this paper we describe a methodology and a framework for training power models derived with two of the best-in-class procedures directly on the online in production nodes and without requiring dedicated training instances. The compact power models are obtained using an online regression-based approach which can track non-stationary workloads and hardware variability. Our experiments on a real-life HPC system demonstrate that the models achieve very high accuracy over all operating modes. We also demonstrate the scalability of our approach and the small amount of resources needed for the online modeling, for both the training and inference phases.
引用
收藏
页码:300 / 307
页数:8
相关论文
共 50 条
  • [1] A Scalable Runtime Fault Localization Framework for High-Performance Computing Systems
    Gao, Jian
    Wei, Hongmei
    Yu, Kang
    Qing, Peng
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2018, 46 (04) : 749 - 761
  • [2] A Scalable Runtime Fault Localization Framework for High-Performance Computing Systems
    Jian Gao
    Hongmei Wei
    Kang Yu
    Peng Qing
    International Journal of Parallel Programming, 2018, 46 : 749 - 761
  • [3] Scalable I/O Forwarding Framework for High-Performance Computing Systems
    Ali, Nawab
    Carns, Philip
    Iskra, Kamil
    Kimpe, Dries
    Lang, Samuel
    Latham, Robert
    Ross, Robert
    Ward, Lee
    Sadayappan, P.
    2009 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING AND WORKSHOPS, 2009, : 86 - +
  • [4] An Extended IMS Framework With a High-Performance and Scalable Distributed Storage and Computing System
    Seraoui, Youssef
    Raouyane, Brahim
    Bellafkih, Mostafa
    2017 INTERNATIONAL SYMPOSIUM ON NETWORKS, COMPUTERS AND COMMUNICATIONS (ISNCC), 2017,
  • [5] IKAROS: A scalable I/O framework for high-performance computing systems.
    Filippidis, Christos
    Tsanakas, Panayiotis
    Cotronis, Yiannis
    JOURNAL OF SYSTEMS AND SOFTWARE, 2016, 118 : 277 - 287
  • [6] High-Performance Computing based Scalable Online Fuzzy Clustering Algorithms for Big Data
    Jha, Preeti
    Tiwari, Aruna
    Bharill, Neha
    Ratnaparkhe, Milind
    Patel, Om Prakash
    Pulakitha, Rapolu
    Chauhan, Aditi
    2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 1400 - 1407
  • [7] High Performance Scalable Computing performance modelling using Ptolemy
    Pauer, Eric K.
    International Journal of Modelling and Simulation, 1999, 19 (04): : 341 - 351
  • [8] A scalable high-performance computing solution for networks on chips
    Forsell, M
    IEEE MICRO, 2002, 22 (05) : 46 - 55
  • [9] Cloud Computing based High-performance Platform in Enabling Scalable Services in Power System
    Deng, Chuang
    Liu, Junyong
    Liu, Yang
    Yu, Zhen
    2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 2200 - 2203
  • [10] Scalable quantum detector tomography by high-performance computing
    Schapeler, Timon
    Schade, Robert
    Lass, Michael
    Plessl, Christian
    Bartley, Tim J.
    QUANTUM SCIENCE AND TECHNOLOGY, 2025, 10 (01):