A Reliability Model for Dependent and Distributed MDS Disk Array Units

被引:6
作者
Arslan, Suayb S. [1 ]
机构
[1] MEF Univ, Dept Comp Engn, TR-34396 Istanbul, Turkey
关键词
Distributed storage; erasure coding; Markov chains; maximum distance separability (MDS); mean time to data loss (MTTDL); STORAGE; CODES;
D O I
10.1109/TR.2018.2878503
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Archiving and systematic backup of large digital data generates a quick demand for multi-petabyte scale storage systems. As drive capacities continue to growbeyond the few terabytes range to address the demands of today's cloud, the likelihood of having multiple/simultaneous disk failures became a reality. Among the main factors causing catastrophic system failures, correlated disk failures and the network bandwidth are reported to be the two common source of performance degradation. The emerging trend is to use efficient/sophisticated erasure codes (EC) equipped with multiple parities and efficient repairs in order to meet the reliability/bandwidth requirements. It is known that mean time to failure and repair rates reported by the disk manufacturers cannot capture life-cycle patterns of distributed storage systems. In this study, we develop failure models based on generalized Markov chains that can accurately capture correlated performance degradations with multiparity protection schemes based on modern maximum distance separable EC. Furthermore, we use the proposed model in a distributed storage scenario to quantify two example use cases: Primarily, the common sense that adding more parity disks are only meaningful if we have a decent decorrelation between the failure domains of storage systems and the reliability of generic multiple single-dimensional EC protected storage systems.
引用
收藏
页码:133 / 148
页数:16
相关论文
共 45 条
[1]  
Aggarwal V, 2014, IEEE INFOCOM SER, P1833, DOI 10.1109/INFOCOM.2014.6848122
[2]  
[Anonymous], 2008, CS08627 U TENN
[3]  
[Anonymous], 2011, CISC VIS NETW IND GL
[4]  
Arslan S. S., 2018, ARXIV170207409
[5]  
Arslan SS, 2018, IEEE INT SYMP INFO, P1316, DOI 10.1109/ISIT.2018.8437759
[6]   MDS Product Code Performance Estimations Under Header CRC Check Failures and Missing Syncs [J].
Arslan, Suayb S. ;
Lee, Jaewook ;
Hodges, Jerry ;
Peng, James ;
Le, Hoa ;
Goker, Turguy .
IEEE TRANSACTIONS ON DEVICE AND MATERIALS RELIABILITY, 2014, 14 (03) :921-930
[7]   Redundancy and Aging of Efficient Multidimensional MDS Parity-Protected Distributed Storage Systems [J].
Arslan, Suayb S. .
IEEE TRANSACTIONS ON DEVICE AND MATERIALS RELIABILITY, 2014, 14 (01) :275-285
[8]  
Balakrishnan N., 1992, HDB LOGISTIC DISTRIB
[9]  
Blum A. M., 1994, Digest of Papers. The Twenty-Fourth International Symposium on Fault-Tolerant Computing (Cat. No.94CH3441-3), P137, DOI 10.1109/FTCS.1994.315647
[10]  
Burkhardt W. A., 1993, Digest of Papers FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing, P432, DOI 10.1109/FTCS.1993.627346