Performance and power modeling and prediction using MuMMI and 10 machine learning methods

被引:3
作者
Wu, Xingfu [1 ]
Taylor, Valerie [1 ]
Lan, Zhiling [2 ]
机构
[1] Univ Chicago, Div Math & Comp Sci, Argonne Natl Lab, Lemont, IL 60439 USA
[2] IIT, Dept Comp Sci, Chicago, IL 60616 USA
基金
美国国家科学基金会;
关键词
fault tolerant applications; machine learning; modeling; MuMMI; performance; power; prediction; FAULT-TOLERANCE; REGRESSION;
D O I
10.1002/cpe.7254
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Energy-efficient scientific applications require insight into how high performance computing system features impact the applications' power and performance. This insight can result from the development of performance and power models. In this article, we use the modeling and prediction tool MuMMI (Multiple Metrics Modeling Infrastructure) and 10 machine learning methods to model and predict performance and power consumption and compare their prediction error rates. We use an algorithm-based fault-tolerant linear algebra code and a multilevel checkpointing fault-tolerant heat distribution code to conduct our modeling and prediction study on the Cray XC40 Theta and IBM BG/Q Mira at Argonne National Laboratory and the Intel Haswell cluster Shepard at Sandia National Laboratories. Our experimental results show that the prediction error rates in performance and power using MuMMI are less than 10% for most cases. By utilizing the models for runtime, node power, CPU power, and memory power, we identify the most significant performance counters for potential application optimizations, and we predict theoretical outcomes of the optimizations. Based on two collected datasets, we analyze and compare the prediction accuracy in performance and power consumption using MuMMI and 10 machine learning methods.
引用
收藏
页数:26
相关论文
共 64 条
[61]   Utilizing ensemble learning for performance and power modeling and improvement of parallel cancer deep learning CANDLE benchmarks [J].
Wu, Xingfu ;
Taylor, Valerie .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (15)
[62]   Performance, Energy, and Scalability Analysis and Improvement of Parallel Cancer Deep Learning CANDLE Benchmarks [J].
Wu, Xingfu ;
Taylor, Valerie ;
Wozniak, Justin M. ;
Stevens, Rick ;
Brettin, Thomas ;
Xia, Fangfang .
PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,
[63]   Using Performance-Power Modeling to Improve Energy Efficiency of HPC Applications [J].
Wu, Xingfu ;
Taylor, Valerie ;
Cook, Jeanine ;
Mucci, Philip J. .
COMPUTER, 2016, 49 (10) :20-29
[64]  
Zeileis, 2020, LAB RECURSIVE PARTYT