Active Measurement of the Impact of Network Switch Utilization on Application Performance

被引:6
作者
Casas, Marc [1 ]
Bronevetsky, Greg [2 ]
机构
[1] Barcelona Supercomp Ctr, 29 Nexus II Bldg, Barcelona 08034, Spain
[2] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA
来源
2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM | 2014年
基金
欧洲研究理事会;
关键词
DESIGN;
D O I
10.1109/IPDPS.2014.28
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Inter-node networks are a key capability of High-Performance Computing (HPC) systems that differentiates them from less capable classes of machines. However, in spite of their very high performance, the increasing computational power of HPC compute nodes and the associated rise in application communication needs make network performance a common performance bottleneck. To achieve high performance in spite of network limitations application developers require tools to measure their applications' network utilization and inform them about how the network's communication capacity relates to the performance of their applications. This paper presents a new performance measurement and analysis methodology based on empirical measurements of network behavior. Our approach uses two benchmarks that inject extra network communication. The first probes the fraction of the network that is utilized by a software component (an application or an individual task) to determine the existence and severity of network contention. The second aggressively injects network traffic while a software component runs to evaluate its performance on less capable networks or when it shares the network with other software components. We then combine the information from the two types of experiments to predict the performance slowdown experienced by multiple software components (e.g. multiple processes of a single MPI application) when they share a single network. Our methodology is applied to individual network switches and demonstrated taking 6 representative HPC applications and predicting the performance slowdowns of the 36 possible application pairs. The average error of our predictions is less than 10%.
引用
收藏
页数:10
相关论文
共 29 条
[1]  
[Anonymous], 2007, SCI APPL PERF CAND P
[2]  
[Anonymous], LLNLTR490254
[3]  
[Anonymous], ACM SIGMETRICS PERFO
[4]  
Bauer G., 2012, Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), P652, DOI 10.1109/CCGrid.2012.123
[5]  
Cáceres R, 1999, IEEE T INFORM THEORY, V45, P2462, DOI 10.1109/18.796384
[6]   AUTOMATIC PHASE DETECTION AND STRUCTURE EXTRACTION OF MPI APPLICATIONS [J].
Casas, Marc ;
Badia, Rosa M. ;
Labarta, Jesus .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2010, 24 (03) :335-360
[7]  
Casas M, 2008, ICS'08: PROCEEDINGS OF THE 2008 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, P349
[8]  
Cetnar J., 1999, ACTINIDE FISSION PRO
[9]   Internet tomography [J].
Coates, M ;
Hero, AO ;
Nowak, R ;
Yu, B .
IEEE SIGNAL PROCESSING MAGAZINE, 2002, 19 (03) :47-65
[10]  
Duffield N. G., 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064), P1351, DOI 10.1109/INFCOM.2000.832532