Verifying Causality between Distant Performance Phenomena in Large-Scale MPI Applications

被引:2
作者
Hermanns, Marc-Andre [1 ]
Geimer, Markus [1 ]
Wolf, Felix [1 ]
Wylie, Brian J. N. [1 ]
机构
[1] Forschungszentrum Julich, Julich Supercomp Ctr, D-52425 Julich, Germany
来源
PROCEEDINGS OF THE PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING | 2009年
关键词
PARALLEL;
D O I
10.1109/.49
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In message-passing applications, the temporal or spatial distance between cause and symptom of a performance problem constitutes a major difficulty in deriving helpful conclusions from performance data. Just knowing the locations of wait states in the program is often insufficient to understand the reason for their occurrence. We present a method for verifying hypotheses on causality between temporally or spatially distant performance phenomena in message-passing applications without altering the application itself. The verification is accomplished by modifying MPI event traces and using them to simulate the hypothetical message-passing behavior. By performing a parallel real-time reenactment of the communication to be simulated using the original execution configuration, we can achieve high scalability and good predictive accuracy in relation to the measured behavior. Not relying on a potentially complex model of the message-passing subsystem, our method is also platform independent.
引用
收藏
页码:78 / 84
页数:7
相关论文
共 12 条
[1]  
*ASC PROGR, ASC SWEEP3D BENCHM C
[2]   Models and finite element techniques for blood flow simulation [J].
Behr, M. ;
Arora, D. ;
Coronado, O. M. ;
Pasquali, M. .
INTERNATIONAL JOURNAL OF COMPUTATIONAL FLUID DYNAMICS, 2006, 20 (3-4) :175-181
[3]  
Geimer M, 2007, LECT NOTES COMPUT SC, V4699, P398
[4]  
Geimer M, 2006, LECT NOTES COMPUT SC, V4192, P303
[5]   SYNTHETIC-PERTURBATION TUNING OF MIMD PROGRAMS [J].
LYON, G ;
SNELICK, R ;
KACKER, R .
JOURNAL OF SUPERCOMPUTING, 1994, 8 (01) :5-28
[6]  
Mendes C.L, 1993, P 5 BRAZ S COMP ARCH
[7]  
RODRIGUEZ G, 2004, LECT NOTES COMPUTER, V3149
[8]  
Song FG, 2004, PROC INT CONF PARAL, P63
[9]  
Wylie BJN, 2007, LECT NOTES COMPUT SC, V4757, P107
[10]   PERFORMANCE-MEASUREMENT, VISUALIZATION AND MODELING OF PARALLEL AND DISTRIBUTED PROGRAMS USING THE AIMS TOOLKIT [J].
YAN, J ;
SARUKKAI, S ;
MEHRA, P .
SOFTWARE-PRACTICE & EXPERIENCE, 1995, 25 (04) :429-461