Examining the impact of critical attributes on hard drive failure times: Multi-state models for left-truncated and right-censored semi-competing risks data

被引:1
作者
Oakley, Jordan L. [1 ]
Forshaw, Matthew [2 ]
Philipson, Pete [1 ]
Wilson, Kevin J. [1 ]
机构
[1] Newcastle Univ, Sch Math Stat & Phys, Newcastle Upon Tyne, England
[2] Newcastle Univ, Sch Comp, Newcastle Upon Tyne, England
基金
英国工程与自然科学研究理事会;
关键词
Critical states; hard disk drives; multi-state models; semi-competing risks; SMART; SEMICOMPETING RISKS; SURVIVAL; IDENTIFICATION; PREDICTIONS; ISSUES;
D O I
10.1002/asmb.2829
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
The ability to predict failures in hard disk drives (HDDs) is a major objective of HDD manufacturers since avoiding unexpected failures may prevent data loss, improve service reliability, and reduce data center downtime. Most HDDs are equipped with a threshold-based monitoring system named self-monitoring, analysis and reporting technology (SMART). The system collects several performance metrics, called SMART attributes, and detects anomalies that may indicate incipient failures. SMART works as a nascent failure detection method and does not estimate the HDDs' remaining useful life. We define critical attributes and critical states for hard drives using SMART attributes and fit multi-state models to the resulting semi-competing risks data. The multi-state models provide a coherent and novel way to model the failure time of a hard drive and allow us to examine the impact of critical attributes on the failure time of a hard drive. We derive dynamic predictions of conditional survival probabilities, which are adaptive to the state of the drive. Using a dataset of HDDs equipped with SMART, we find that drives are more likely to fail after entering critical states. We evaluate the predictive accuracy of the proposed models with a case study of HDDs equipped with SMART, using the time-dependent area under the receiver operating characteristic curve (AUC) and the expected prediction error (PE). The results suggest that accounting for changes in the critical attributes improves the accuracy of dynamic predictions.
引用
收藏
页码:684 / 709
页数:26
相关论文
共 41 条
[1]  
[Anonymous], 2024, Welcome to the backblaze, hard drive data and stats
[2]  
Backblaze, 2022, WHAT SMART STATS TEL
[3]   Predicting Disk Replacement towards Reliable Data Centers [J].
Botezatu, Mirela ;
Giurgiu, Ioana ;
Bogojeska, Jasmina ;
Wiesmann, Dorothea .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :39-48
[4]   Hard Disk Failure Prediction on Highly Imbalanced Data using LSTM Network [J].
Cahyadi ;
Forshaw, Matthew .
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, :3985-3991
[5]  
Carlini A, 2017, Studi Classici e Orientali (SCO), P1
[6]   Stan: A Probabilistic Programming Language [J].
Carpenter, Bob ;
Gelman, Andrew ;
Hoffman, Matthew D. ;
Lee, Daniel ;
Goodrich, Ben ;
Betancourt, Michael ;
Brubaker, Marcus A. ;
Guo, Jiqiang ;
Li, Peter ;
Riddell, Allen .
JOURNAL OF STATISTICAL SOFTWARE, 2017, 76 (01) :1-29
[7]  
Chavdarov I, 2018, 2018 26TH INTERNATIONAL CONFERENCE ON SOFTWARE, TELECOMMUNICATIONS AND COMPUTER NETWORKS (SOFTCOM), P429
[8]   Knowledge Graph Based Hard Drive Failure Prediction [J].
Chhetri, Tek Raj ;
Kurteva, Anelia ;
Adigun, Jubril Gbolahan ;
Fensel, Anna .
SENSORS, 2022, 22 (03)
[9]   Causal inference for non-mortality outcomes in the presence of death [J].
Egleston, Brian L. ;
Scharfstein, Daniel O. ;
Freeman, Ellen E. ;
West, Sheila K. .
BIOSTATISTICS, 2007, 8 (03) :526-545
[10]   On semi-competing risks data [J].
Fine, JP ;
Jiang, H ;
Chappell, R .
BIOMETRIKA, 2001, 88 (04) :907-919