SSD Drive Failure Prediction on Alibaba Data Center Using Machine Learning

被引:1
|
作者
Chen, Lei [1 ]
Zhu, Zongpeng [2 ]
Li, Anyu [2 ]
Mashhadi, Najmeh [1 ]
Frickey, Robert [1 ]
Ye, Jinhe [1 ]
Guo, Xin [1 ]
机构
[1] Solidigm, Data Ctr Div, San Jose, CA 95134 USA
[2] Alibaba Grp, Alibaba Cloud, Hangzhou, Peoples R China
来源
2022 14TH IEEE INTERNATIONAL MEMORY WORKSHOP (IMW 2022) | 2022年
关键词
SSD drive failure detection; SSD SMART Data; Ensemble Learning; Light GBM and Random Forest; RELIABILITY; MODEL;
D O I
10.1109/IMW52921.2022.9779284
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Flash-based Solid-State Drives (SSDs) have become a critical storage tier in data centers and enterprise storage systems. Cloud companies are very interested in predicting drive failures. Drive failure prediction enables managing drive replacement and backup data beforehand and helps planning drive purchase strategies. Solidigm and Alibaba collaborate to collect and analyze Self-Monitoring, Analysis, and Reporting Technology (SMART) data and predict SSD failures 30 days ahead of time using machine learning techniques. In this paper, we use group k-fold cross-validation to select the best parameters for machine learning models and avoid overfitting. After obtaining the prediction score of each sample from the model, a post-processing with neural network is applied on those prediction scores to get the drive-level prediction. A modified ensemble learning method is designed and implemented by majority voting on different models of Light GBM and Random Forest to further improve prediction results. This paper is the first work in both academia and the storage industry to design a drive failure prediction system for deploying in data centers by optimizing models with the highest Precision instead of the highest F1-score to minimize false positive rate. We advance to get drive failure prediction with 100% Precision and 21% Recall, enabling us to avoid the high cost of false positives.
引用
收藏
页码:29 / 33
页数:5
相关论文
共 50 条
  • [1] Failure Prediction of Aircraft Equipment Using Machine Learning with a Hybrid Data Preparation Method
    Celikmih, Kadir
    Inan, Onur
    Uguz, Harun
    SCIENTIFIC PROGRAMMING, 2020, 2020
  • [2] Failure Prediction of Municipal Water Pipes Using Machine Learning Algorithms
    Liu, Wei
    Wang, Binhao
    Song, Zhaoyang
    WATER RESOURCES MANAGEMENT, 2022, 36 (04) : 1271 - 1285
  • [3] Failure prediction using machine learning in a virtualised HPC system and application
    Mohammed, Bashir
    Awan, Irfan
    Ugail, Hassan
    Younas, Muhammad
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (02): : 471 - 485
  • [4] Application of Machine Learning for Dragline Failure Prediction
    Taghizadeh, Amir
    Demirel, Nuray
    1ST SCIENTIFIC PRACTICAL CONFERENCE INTERNATIONAL INNOVATIVE MINING SYMPOSIUM (IN MEMORY OF PROF. VLADIMIR PRONOZA), 2017, 15
  • [5] Time-series failure prediction on small datasets using machine learning
    Maior, Caio B. S.
    Silva, Thaylon G.
    IEEE LATIN AMERICA TRANSACTIONS, 2024, 22 (05) : 362 - 371
  • [6] Prediction of solar irradiance with machine learning methods using satellite data
    Ercan, Ugur
    Kocer, Abdulkadir
    INTERNATIONAL JOURNAL OF GREEN ENERGY, 2024, 21 (05) : 1174 - 1183
  • [7] Task Failure Prediction in Cloud Data Centers Using Deep Learning
    Gao, Jiechao
    Wang, Haoyu
    Shen, Haiying
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (03) : 1411 - 1422
  • [8] Machine Learning based quality prediction for milling processes using internal machine tool data
    Fertig, A.
    Weigold, M.
    Chen, Y.
    ADVANCES IN INDUSTRIAL AND MANUFACTURING ENGINEERING, 2022, 4
  • [9] Machine learning-based survival rate prediction of Korean hepatocellular carcinoma patients using multi-center data
    Noh, Byeonggwan
    Park, Young Mok
    Kwon, Yujin
    Choi, Chang In
    Choi, Byung Kwan
    Seo, Kwang Il
    Park, Yo-Han
    Yang, Kwangho
    Lee, Sunju
    Ha, Taeyoung
    Hyon, YunKyong
    Yoon, Myunghee
    BMC GASTROENTEROLOGY, 2022, 22 (01)
  • [10] Failure prediction in the refinery piping system using machine learning algorithms: classification and comparison
    Kanoun, Yassine
    Aghbash, Aynaz Mohammadi
    Belem, Tikou
    Zouari, Bassem
    Mrad, Hatem
    5TH INTERNATIONAL CONFERENCE ON INDUSTRY 4.0 AND SMART MANUFACTURING, ISM 2023, 2024, 232 : 1663 - 1672