A Machine-Learning Approach for Communication Prediction of Large-Scale Applications

被引:6
作者
Papadopoulou, Nikela [1 ]
Goumas, Georgios [1 ]
Koziris, Nectarios [1 ]
机构
[1] Natl Tech Univ Athens, Sch Elect & Comp Engn, GR-10682 Athens, Greece
来源
2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015 | 2015年
关键词
performance prediction; communication time; MPI applications; large-scale systems;
D O I
10.1109/CLUSTER.2015.27
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we present a machine-learning approach to predict the total communication time of parallel applications. Communication time is heavily dependent on a very wide set of parameters relevant to the architecture, runtime configuration and application communication profile. We focus our study on parameters that can be easily extracted from the application and the process mapping ahead of execution. To this direction we define a small set of descriptive metrics and build a simple benchmark that can sweep over the parameter space in a straightforward way. We use this benchmarking data to train a robust multiple variable regression model which serves as our communication predictor. Our experimental results show notable accuracy in predicting the communication time of two indicative application kernels on a supercomputer utilizing from a few dozen to a few thousands processing cores.
引用
收藏
页码:120 / 123
页数:4
相关论文
共 5 条
  • [1] Alexandrov A., 1995, SPAA '95. 7th Annual ACM Symposium on Parallel Algorithms and Architectures, P95, DOI 10.1145/215399.215427
  • [2] [Anonymous], P INT C HIGH PERF CO
  • [3] [Anonymous], P SC13 INT C HIGH PE
  • [4] Bhatele A., 2015, IPDPS 15
  • [5] QUANTIFYING NETWORK CONTENTION ON LARGE PARALLEL MACHINES
    Bhatele, Abhinav
    Kale, Laxmikant V.
    [J]. PARALLEL PROCESSING LETTERS, 2009, 19 (04) : 553 - 572