Prediction method of port blocking failure in high performance interconnection networks

被引:0
|
作者
Xu J. [1 ]
Hu X. [1 ]
Yang H. [1 ]
Wang Q. [1 ]
Zhang L. [1 ]
Tang F. [1 ]
机构
[1] College of Computer Science and Technology, National University of Defense Technology, Changsha
关键词
failure prediction; interconnection network; machine learning;
D O I
10.11887/j.cn.202205001
中图分类号
学科分类号
摘要
With the increase of system scale, chip power consumption and link rate, the overall failure rate of high-performance interconnection networks will continue rising, and the traditional operation and maintenance methods will be difficult to sustain, which brings great challenges to the overall reliability and availability of HPC (high performance computing). An unsupervised algorithm prediction model for serious network failures such as network port blocking was proposed. In this model,the symptomatic rules were extracted from the history information of the switch port status register and a new feature vector was formed. The K-means clustering algorithm was used to learn and classify the feature vectors. In the prediction, the DES(double exponential smoothing) algorithm was used to predict the port state in the future through a combination of the current state of the port,and a new feature vector was obtained and K-means algorithm was used to predict whether the port blocking failure would occur. The topology information was used to build independent sub prediction models with ToR switch ports and Spine switch ports respectively,so as to further improve the accuracy of prediction. The experimental results show that the prediction model can maintain the recall rate of 88. 2%, and reach the accuracy rate of 65. 2%. It can provide effective early warning and guidance for the operation and maintenance personnel in the actual system. © 2022 National University of Defense Technology. All rights reserved.
引用
收藏
页码:1 / 12
页数:11
相关论文
共 23 条
  • [1] ORNL's frontier first to break the exaflop ceiling
  • [2] WRIGHT M., The opportunities and challenges of exascale computing [ R ], (2010)
  • [3] DUATO J, YALAMANCHILI S, NI L., Interconnection networks: an engineering approach [ M ], (2002)
  • [4] DALLY W J, TOWLES B., Principles ad practices of interconnection networks [ M ], (2004)
  • [5] DOMKE J, HOEFLER T, MATSUOKA S., Fail-in-place network design: interaction between topology, routing algorithm and failures [ C ], Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 597-608, (2014)
  • [6] Market guide for AIOps platforms
  • [7] MASOOD A, HASHMI A., AIOps: predictive analytics & machine learning in operations, Cognitive Computing Recipes, pp. 359-382, (2019)
  • [8] ANDENMATTEN M., AIOps-artficial intelligence fur Re¬operations, HMD Praxis Der Wirtschaftsinformatik, 56, 2, pp. 332-344, (2019)
  • [9] FRONZA I, SILLITTI A, SUCCI G, Et al., Failure prediction based on log files using random indexing and support vector machines, Journal of Systems and Software, 86, 1, pp. 2-11, (2013)
  • [10] PITAKRAT T, OKANOVI C D, VAN HOORN A, Et al., Hora: architect re-aware online failure prediction [ J ], Journal of Systems and Software, 137, pp. 669-685, (2018)