Toward Rapid Understanding of Production HPC Applications and Systems

被引:20
作者
Agelastos, Anthony [1 ]
Allan, Benjamin [1 ]
Brandt, Jim [1 ]
Gentile, Ann [1 ]
Lefantzi, Sophia [1 ]
Monk, Steve [1 ]
Ogden, Jeff [1 ]
Rajan, Mahesh [1 ]
Stevenson, Joel [1 ]
机构
[1] Sandia Natl Labs, POB 5800, Albuquerque, NM 87185 USA
来源
2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015 | 2015年
关键词
High Performance Computing; Monitoring;
D O I
10.1109/CLUSTER.2015.71
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A detailed understanding of HPC application's resource needs and their complex interactions with each other and HPC platform resources is critical to achieving scalability and performance. Such understanding has been difficult to achieve because typical application profiling tools do not capture the behaviors of codes under the potentially wide spectrum of actual production conditions and because typical monitoring tools do not capture system resource usage information with high enough fidelity to gain sufficient insight into application performance and demands. In this paper we present both system and application profiling results based on data obtained through synchronized system wide monitoring on a production HPC cluster at Sandia National Laboratories (SNL). We demonstrate analytic and visualization techniques that we are using to characterize application and system resource usage under production conditions for better understanding of application resource needs. Our goals are to improve application performance (through understanding application-to-resource mapping and system throughput) and to ensure that future system capabilities match their intended workloads.
引用
收藏
页码:464 / 473
页数:10
相关论文
共 7 条
[1]  
[Anonymous], TACC STATS HPC REPOR
[2]  
[Anonymous], P IEEE ACM INT C HIG
[3]  
[Anonymous], TOOLS HIGH PERFORMAN
[4]  
[Anonymous], 2011, SAND20117597
[5]  
Frisch M.J., 2016, Gaussian 09, Revision D.01, V16
[6]   CTH - A 3-DIMENSIONAL SHOCK-WAVE PHYSICS CODE [J].
MCGLAUN, JM ;
THOMPSON, SL ;
ELRICK, MG .
INTERNATIONAL JOURNAL OF IMPACT ENGINEERING, 1990, 10 (1-4) :351-360
[7]  
Vetter J., mpiP