2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015
|
2015年
关键词:
performance prediction;
communication time;
MPI applications;
large-scale systems;
D O I:
10.1109/CLUSTER.2015.27
中图分类号:
TP3 [计算技术、计算机技术];
学科分类号:
0812 ;
摘要:
In this paper we present a machine-learning approach to predict the total communication time of parallel applications. Communication time is heavily dependent on a very wide set of parameters relevant to the architecture, runtime configuration and application communication profile. We focus our study on parameters that can be easily extracted from the application and the process mapping ahead of execution. To this direction we define a small set of descriptive metrics and build a simple benchmark that can sweep over the parameter space in a straightforward way. We use this benchmarking data to train a robust multiple variable regression model which serves as our communication predictor. Our experimental results show notable accuracy in predicting the communication time of two indicative application kernels on a supercomputer utilizing from a few dozen to a few thousands processing cores.
引用
收藏
页码:120 / 123
页数:4
相关论文
共 5 条
[1]
Alexandrov A., 1995, SPAA '95. 7th Annual ACM Symposium on Parallel Algorithms and Architectures, P95, DOI 10.1145/215399.215427