Estimation of RTT and Loss Rate of Wide-Area Connections Using MPI Measurements

被引:0
作者
Rao, Nageswara S. V. [1 ]
Imam, Neena [1 ]
Liu, Zhengchun [2 ]
Kettimuthu, Raj [2 ]
Foster, Ian [2 ]
机构
[1] Oak Ridge Natl Lab, Oak Ridge, TN 37830 USA
[2] Argonne Natl Lab, 9700 S Cass Ave, Argonne, IL 60439 USA
来源
PROCEEDINGS OF 6TH IEEE/ACM ANNUAL INTERNATIONAL WORKSHOP ON INNOVATING THE NETWORK FOR DATA-INTENSIVE SCIENCE (INDIS) 2019 | 2019年
关键词
MPI; wide-area networks; execution time; network measurements; RTT; loss rate;
D O I
10.1109/INDIS49552.2019.00008
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Scientific computations are expected to be increasingly distributed across wide-area networks, and the Message Passing Interface (MPI) has been shown to scale to support their communications over long distances. These computations should account for certain network parameters to ensure an effective execution, for example, by avoiding highly congested and long connections. The execution times of MPI basic operations reflect the connection parameters, including the Round Trip Time (RTT) and loss rate. We describe five machine leaning methods to estimate the connection RTT and loss rate using execution times of MPI basic operations. We utilize execution time measurements of MPI_Sendrecv operations collected over emulated 10 Gbps connections with 0-366 ms round-trip times, wherein the longest connection spans the globe, under up to 20% periodic losses. These methods provide disparate, namely, linear and non-linear, and smooth and non-smooth, estimates of RTT and loss rate. Our results show that accurate estimates can be generated at low loss rates but they become inaccurate at loss rates 10% and higher. Overall, these results constitute a case study of the strengths and limitations of machine learning methods in inferring network-level parameters using application-level measurements.
引用
收藏
页码:17 / 24
页数:8
相关论文
共 9 条
[1]  
[Anonymous], 2005, 3 INT WORKSH PROT FA
[2]  
[Anonymous], 2004, P PFLDNET
[3]  
Chhabra A., 2017, 2017 IEEE ACM INN NE
[4]  
Devroye L., 1996, A Probabilistic Theory of Pattern Recognition
[5]  
Giannakou A., 2018 IEEE ACM INN NE
[6]  
Rao N.S.V., 2018, DOE ASCR SCI MACH LE
[7]  
Rao N.S.V., 2019, 13 ANN IEEE INT SYST
[8]  
Rao N.S.V., 2019, I NUCL MAT MAN ANN M
[9]  
Rao N. S. V., 2018, FAULT DETECTION DIAG