Reliability Characterization and Failure Prediction of 3D TLC SSDs in Large-Scale Storage Systems

被引:8
作者
Li, Peng [1 ,2 ,3 ]
Dang, Wei [1 ,2 ]
Lyu, Congmin [1 ,2 ]
Xie, Min [3 ,4 ]
Bao, Quanyang [5 ]
Ji, Xiaofeng [6 ]
Zhou, Jianhua [7 ]
机构
[1] Chinese Acad Sci, Technol & Engn Ctr Space Utilizat, Beijing 100094, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] City Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Peoples R China
[4] City Univ Hong Kong, Sch Data Sci, Hong Kong, Peoples R China
[5] Huawei Technol Co Ltd, Cloud & BG Comp Prod Line, Shenzhen 523000, Peoples R China
[6] Huawei Technol Co Ltd, Cloud & AI BG Comp Prod Line, Beijing 100095, Peoples R China
[7] Huawei Technol Co Ltd, Cloud & AI BG Comp Prod Line, Chengdu 611731, Peoples R China
关键词
Solid state drive (SSD); reliability; machine learning; prediction methods; data storage systems; FLASH-MEMORY;
D O I
10.1109/TDMR.2021.3063164
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
3D triple-level cell (TLC) NAND flash based solid state drive (SSD) is gradually becoming the dominant storage media in large-scale storage systems due to high storage density and low cost-per-bit. It ranks one of the top replaced hardware components in systems and their enormous amount also indirectly increases the failure probability, resulting in irreversible data loss disaster and service unavailability. This paper for the first time investigates system-level 3D TLC SSDs to characterize reliability and sub-health status based on field Self-Monitoring, Analysis and Reporting Technology (SMART) data, and predict impending failure proactively. We explore real-world datasets and derive some findings for each selected attribute in predetermined categories, contributing to the following feature selection and enhancing the interpretability of prediction models. Moreover, various machine learning models are trained to predict failures ahead of time, and experimental results show that random forest model can achieve 0.636 f(1) -score and 0.662 MCC for a 7-day prediction horizon, and 42.5% true positive rate (TPR) with 0.00% false positive rate (FPR). Different time window sizes, training set fractions and ratios of negative to positive are analyzed as well.
引用
收藏
页码:224 / 235
页数:12
相关论文
共 45 条
  • [1] Ahmadian S, 2018, DES AUT TEST EUROPE, P207, DOI 10.23919/DATE.2018.8342004
  • [2] Large Scale Predictive Analytics for Hard Disk Remaining Useful Life Estimation
    Anantharaman, Preethi
    Qiao, Mu
    Jadav, Divyesh
    [J]. 2018 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS), 2018, : 251 - 254
  • [3] [Anonymous], 2012, SOLID STATE DRIVE SS
  • [4] [Anonymous], 2016, SOLID STATE DRIVE SS
  • [5] Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives
    Cai, Yu
    Ghose, Saugata
    Haratsch, Erich F.
    Luo, Yixin
    Mutlu, Onur
    [J]. PROCEEDINGS OF THE IEEE, 2017, 105 (09) : 1666 - 1704
  • [6] Can Erasure Codes Damage Reliability in SSD-Based Storage Systems?
    Chamazcoti, Saeideh Alinezhad
    Safaei, Bardia
    Miremadi, Seyed Ghassem
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2019, 7 (03) : 435 - 446
  • [7] Reliability of NAND Flash Arrays: A Review of What the 2-D-to-3-D Transition Meant
    Compagnoni, Christian Monzio
    Spinelli, Alessandro S.
    [J]. IEEE TRANSACTIONS ON ELECTRON DEVICES, 2019, 66 (11) : 4504 - 4516
  • [8] Reviewing the Evolution of the NAND Flash Technology
    Compagnoni, Christian Monzio
    Goda, Akira
    Spinelli, Alessandro S.
    Feeley, Peter
    Lacaita, Andrea L.
    Visconti, Angelo
    [J]. PROCEEDINGS OF THE IEEE, 2017, 105 (09) : 1609 - 1633
  • [9] 3-D NAND Flash Value-Aware SSD: Error-Tolerant SSD Without ECCs for Image Recognition
    Deguchi, Yoshiaki
    Nakamura, Toshiki
    Hayakawa, Atsuna
    Thkeuchi, Ken
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (06) : 1800 - 1811
  • [10] Write and Read Frequency-Based Word-Line Batch VTH Modulation for 2-D and 3-D-TLC NAND Flash Memories
    Deguchi, Yoshiaki
    Suzuki, Shun
    Takeuchi, Ken
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (10) : 2917 - 2926