Evaluating Reliability of SSD-Based I/O Caches in Enterprise Storage Systems

被引:3
作者
Ahmadian, Saba [1 ]
Taheri, Farhad [1 ]
Asadi, Hossein [1 ]
机构
[1] Sharif Univ Technol, Dept Comp Engn, Data Storage Networks & Proc DSN Lab, Tehran, Iran
基金
美国国家科学基金会;
关键词
Reliability; Power system reliability; Performance evaluation; Temperature distribution; Data centers; Computer architecture; Temperature measurement; Flash-based solid-state drives (SSDs); storage systems; I; O cache; reliability analysis; power outage; PERFORMANCE; RAID;
D O I
10.1109/TETC.2019.2945087
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
I/O caching techniques are widely employed in enterprise storage systems in order to enhance performance of I/O intensive applications in large-scale data centers. Due to higher performance compared to Hard Disk Drives (HDDs) and lower price and non-volatility compared to Dynamic Random-Access Memories (DRAM), Flash-based Solid-State Drives (SSDs) are used as a main media in the caching layer of storage systems. Although SSDs are known as non-volatile devices but recent studies have reported large number of data failures due to power outage in SSDs. To overcome the reliability implications of SSD-based I/O caching schemes, RAID-1 (mirrored) configuration is commonly used to avoid data loss due to uncommitted write operations. Such configuration, however, may still experience data loss in the cache layer due to correlated failures in SSDs. To our knowledge, none of previous studies have investigated the reliability of SSD-based I/O caching schemes in enterprise storage systems. In this paper, we present a comprehensive analysis investigating the reliability of SSD-based I/O caching architectures used in enterprise storage systems under power failure and high-operating temperature. We explore variety of SSDs from top vendors and investigate the cache reliability in mirrored configuration. To this end, we first develop a physical fault injection and failure detection platform and then investigate the impact of workload dependent parameters on the reliability of I/O cache in the presence of two common failure types in data centers, power outage and high temperature faults. We implement an I/O cache scheme using an open-source I/O cache module in Linux operating system. The experimental results obtained by conducting more than twenty thousand of physical fault injections on the implemented I/O cache with different write policies reveal that the failure rate of the I/O cache is significantly affected by workload dependent parameters. Our results show that unlike workload requests access pattern, the other workload dependent parameters such as request size, Working Set Size (WSS), and sequence of the accesses have considerable impact on the I/O cache failure rate. We observe a significant growth in the failure rate in the workloads by decreasing the size of the requests (by more than 14X). Furthermore, we observe that in addition to writes, the read accesses to the I/O cache are subjected to failure in presence of sudden power outage (the failure mainly occurs during promoting data to the cache). In addition, we observe that I/O cache experiences no data failure upon high temperature faults.
引用
收藏
页码:1914 / 1929
页数:16
相关论文
共 50 条
[21]   I/O Workload Management for All-Flash Datacenter Storage Systems Based on Total Cost of Ownership [J].
Yang, Zhengyu ;
Awasthi, Manu ;
Ghosh, Mrinmoy ;
Bhimani, Janki ;
Mi, Ningfang .
IEEE TRANSACTIONS ON BIG DATA, 2022, 8 (02) :332-345
[22]   I/O Performance Modeling of Virtualized Storage Systems [J].
Noorshams, Qais ;
Rostami, Kiana ;
Kounev, Samuel ;
Tuma, Petr ;
Reussner, Ralf .
2013 IEEE 21ST INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS & SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS 2013), 2013, :121-+
[23]   I/O profiling for distributed IP storage systems [J].
Han, JZ ;
Zhou, D ;
He, XB ;
Gao, JZ .
ICESS 2005: SECOND INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, 2005, :581-586
[24]   Evaluating Systems Reliability With A New Method Based on Node Cutset [J].
Lamalem, Yasser ;
Hamida, Soufiane ;
Housni, Khalid ;
Ouhmida, Asmae ;
Cherradi, Bouchaib .
2022 2ND INTERNATIONAL CONFERENCE ON INNOVATIVE RESEARCH IN APPLIED SCIENCE, ENGINEERING AND TECHNOLOGY (IRASET'2022), 2022, :693-696
[25]   A Reliability Enhancement Design under the Flash Translation Layer for MLC-Based Flash-Memory Storage Systems [J].
Chang, Yuan-Hao ;
Yang, Ming-Chang ;
Kuo, Tei-Wei ;
Hwang, Ren-Hung .
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2013, 13 (01)
[26]   Energy Storage Sizing and Probabilistic Reliability Assessment for Power Systems Based on Composite Demand [J].
Alamri, Abdullah ;
Alowaifeer, Maad ;
Meliopoulos, A. P. Sakis .
IEEE TRANSACTIONS ON POWER SYSTEMS, 2022, 37 (01) :106-117
[27]   HFA: A Hint Frequency-based Approach to Enhance the I/O Performance of Multi-level Cache Storage Systems [J].
Meng, Xiaodong ;
Wu, Chentao ;
Li, Jie ;
Liang, Xiaoyao ;
Bin, Yao ;
Guo, Minyi ;
Zheng, Long .
2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, :376-383
[28]   Reducing I/O variability using dynamic I/O path characterization in petascale storage systems [J].
Son, Seung Woo ;
Sehrish, Saba ;
Liao, Wei-keng ;
Oldfield, Ron ;
Choudhary, Alok .
JOURNAL OF SUPERCOMPUTING, 2017, 73 (05) :2069-2097
[29]   PIPULS: Predicting I/O Patterns Using LSTM in Storage Systems [J].
Li, Dongyang ;
Wang, Yan ;
Xu, Bin ;
Li, Wenjiang ;
Li, Weijun ;
Yu, Lina ;
Yang, Qing .
2019 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE BIG DATA AND INTELLIGENT SYSTEMS (HPBD&IS), 2019, :14-21
[30]   A Commitment-based Management Strategy for the Performance and Reliability Enhancement of Flash-memory Storage Systems [J].
Chang, Yuan-Hao ;
Kuo, Tei-Wei .
DAC: 2009 46TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2, 2009, :858-863