Failure Prediction, Lead Time Estimation and Health Degree Assessment for Hard Disk Drives Using Voting Based Decision Trees

被引:8
作者
Kaur, Kamaljit [1 ]
Kaur, Kuljit [2 ]
机构
[1] Guru Nanak Dev Univ, Dept Comp Engg & Technol, Amritsar 143005, Punjab, India
[2] Guru Nanak Dev Univ, Dept Comp Sci, Amritsar 143005, Punjab, India
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2019年 / 60卷 / 03期
关键词
Hard disk drive; lead time; health status; N-splitting algorithm; machine learning; deep learning; data storage; unbalancing problem;
D O I
10.32604/cmc.2019.07675
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hard Disk drives (HDDs) are an essential component of cloud computing and big data, responsible for storing humongous volumes of collected data. However, HDD failures pose a huge challenge to big data servers and cloud service providers. Every year, about 10% disk drives used in servers crash at least twice, lead to data loss, recovery cost and lower reliability. Recently, the researchers have used SMART parameters to develop various prediction techniques, however, these methods need to be improved for reliability and real-world usage due to the following factors: they lack the ability to consider the gradual change/deterioration of HDDs; they have failed to handle data unbalancing and biases problem; they don't have adequate mechanisms for health status prediction of HDDs. This paper introduces a novel voting-based decision tree classifier to cater failure prediction, a balance splitting algorithm for the data unbalancing problem, an advanced procedure for lead time estimation and R-CNN based approach for health status estimation. Our system works robustly by considering a gradual change in SMART parameters. The system is rigorously tested on 3 datasets and it delivered benchmarks results as compared to the state of the art.
引用
收藏
页码:913 / 946
页数:34
相关论文
共 32 条
[1]  
Bairavasundaram LN, 2008, PROCEEDINGS OF THE 6TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES (FAST '08), P223
[2]  
Bairavasundaram LN, 2007, PERF E R SI, V35, P289
[3]  
Beach Brian, 2014, Hard Drive SMART Stats
[4]  
Eckart B, 2008, I S MOD ANAL SIM COM, P85
[5]   ESTIMATION OF LORENZ-CURVE AND GINI-INDEX [J].
GASTWIRT.JL .
REVIEW OF ECONOMICS AND STATISTICS, 1972, 54 (03) :306-316
[6]   Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products [J].
Granitto, Pablo M. ;
Furlanello, Cesare ;
Biasioli, Franco ;
Gasperi, Flavia .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2006, 83 (02) :83-90
[7]  
Hamerly G., 2001, ICML, V1, P202
[8]   Characterizing Disk Failures with Quantified Disk Degradation Signatures: An Early Experience [J].
Huang, Song ;
Fu, Song ;
Zhang, Quan ;
Shi, Weisong .
2015 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2015, :150-159
[9]   Improved disk-drive failure warnings [J].
Hughes, GF ;
Murray, JF ;
Kreutz-Delgado, K ;
Elkan, C .
IEEE TRANSACTIONS ON RELIABILITY, 2002, 51 (03) :350-357
[10]   Categorization of Crowd Varieties using Deep Concurrent Convolution Neural Network [J].
Khan, Gulraiz ;
Farooq, Muhammad Ali ;
Hussain, Junaid ;
Tariq, Zeeshan ;
Khan, Muhammad Usman Ghani .
2019 2ND INTERNATIONAL CONFERENCE ON ADVANCEMENTS IN COMPUTATIONAL SCIENCES (ICACS), 2019, :80-85