Failure prediction using machine learning in a virtualised HPC system and application

被引:1
作者
Bashir Mohammed
Irfan Awan
Hassan Ugail
Muhammad Younas
机构
[1] University of Bradford,School of Electrical Engineering and Computer Science
[2] Oxford Brookes University,Department of Computing & Communication Technologies
来源
Cluster Computing | 2019年 / 22卷
关键词
Failure; Machine learning; High performance computing; Cloud computing;
D O I
暂无
中图分类号
学科分类号
摘要
Failure is an increasingly important issue in high performance computing and cloud systems. As large-scale systems continue to grow in scale and complexity, mitigating the impact of failure and providing accurate predictions with sufficient lead time remains a challenging research problem. Traditional existing fault-tolerance strategies such as regular check-pointing and replication are not adequate because of the emerging complexities of high performance computing systems. This necessitates the importance of having an effective as well as proactive failure management approach in place aimed at minimizing the effect of failure within the system. With the advent of machine learning techniques, the ability to learn from past information to predict future pattern of behaviours makes it possible to predict potential system failure more accurately. Thus, in this paper, we explore the predictive abilities of machine learning by applying a number of algorithms to improve the accuracy of failure prediction. We have developed a failure prediction model using time series and machine learning, and performed comparison based tests on the prediction accuracy. The primary algorithms we considered are the support vector machine (SVM), random forest (RF), k-nearest neighbors (KNN), classification and regression trees (CART) and linear discriminant analysis (LDA). Experimental results indicates that the average prediction accuracy of our model using SVM when predicting failure is 90% accurate and effective compared to other algorithms. This finding implies that our method can effectively predict all possible future system and application failures within the system.
引用
收藏
页码:471 / 485
页数:14
相关论文
共 50 条
  • [21] Disk storage failure prediction in datacenter using machine learning models
    Ramanathan, Manikandan
    Narayanan, Kumar
    APPLIED NANOSCIENCE, 2021, 13 (2) : 1569 - 1590
  • [22] Machine learning for prediction of business company failure in hospitality sector
    Brito, Jose Henrique
    Pereira, Jose Manuel
    da Silva, Amelia Ferreira
    Angelico, Maria Jose
    Abreu, Antonio
    Teixeira, Sandrina
    ADVANCES IN TOURISM, TECHNOLOGY AND SMART SYSTEMS, 2020, 171 : 307 - 317
  • [23] Wind turbine gearbox failure and remaining useful life prediction using machine learning techniques
    Carroll, James
    Koukoura, Sofia
    McDonald, Alasdair
    Charalambous, Anastasis
    Weiss, Stephan
    McArthur, Stephen
    WIND ENERGY, 2019, 22 (03) : 360 - 375
  • [24] Data-Aware Compression for HPC using Machine Learning
    Plehn J.
    Fuchs A.
    Kuhn M.
    Lüttgau J.
    Ludwig T.
    Operating Systems Review (ACM), 2022, 56 (01): : 62 - 69
  • [25] Data-Aware Compression for HPC using Machine Learning
    Plehn, Julius
    Fuchs, Anna
    Kuhn, Michael
    Luettgau, Jakob
    Ludwig, Thomas
    OPERATING SYSTEMS REVIEW, 2022, 56 (01) : 62 - 69
  • [26] Application of machine learning to magnitude estimation in earthquake emergency prediction system
    Hu AnDong
    Zhang HaiMing
    CHINESE JOURNAL OF GEOPHYSICS-CHINESE EDITION, 2020, 63 (07): : 2617 - 2626
  • [27] Automated Performance Modeling of HPC Applications Using Machine Learning
    Sun, Jingwei
    Sun, Guangzhong
    Zhan, Shiyan
    Zhang, Jiepeng
    Chen, Yong
    IEEE TRANSACTIONS ON COMPUTERS, 2020, 69 (05) : 749 - 763
  • [28] Application of Machine Learning in the Prediction of Hypothyreoidism
    Helac, Hanna
    Kamenjas, Edina
    Hodzic, Nejira
    MEDICON 2023 AND CMBEBIH 2023, VOL 2, 2024, 94 : 756 - 761
  • [29] Prediction System for Prostate Cancer Recurrence Using Machine Learning
    Lee, Sun Jung
    Yu, Sung Hye
    Kim, Yejin
    Kim, Jae Kwon
    Hong, Jun Hyuk
    Kim, Choung-Soo
    Seo, Seong Il
    Byun, Seok-Soo
    Jeong, Chang Wook
    Lee, Ji Youl
    Choi, In Young
    APPLIED SCIENCES-BASEL, 2020, 10 (04):
  • [30] Block size estimation for data partitioning in HPC applications using machine learning techniques
    Cantini, Riccardo
    Marozzo, Fabrizio
    Orsino, Alessio
    Talia, Domenico
    Trunfio, Paolo
    Badia, Rosa M.
    Ejarque, Jorge
    Vazquez-Novoa, Fernando
    JOURNAL OF BIG DATA, 2024, 11 (01)