Reliability of Centralized vs. Parallel Software Models for Composable Storage Systems

被引:1
|
作者
Blaum, Mario [1 ]
Muench, Paul [1 ]
机构
[1] IBM Res Div Almaden, San Jose, CA 95120 USA
来源
2021 IEEE 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2021) | 2021年
关键词
Hyperconverged architectures; hyper-converged infrastructure (HCI); cloud applications; DIMM failure rate; metadata server; composable systems;
D O I
10.1109/QRS54544.2021.00064
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Modern storage systems consist of many hardware and software components. The core of these systems are server drawers containing data, where at least one of such drawers consists of parity (a special case is two mirrored drawers). We analyze the failure rate of two such systems both based on hyperconverged architectures: one centralized, in which the drawers share the metadata server, and one parallel, in which each drawer has its own metadata server. Inherently the parallel systems will have greater reliability. However, the new CXL and Gen-Z architectures are enabling a centralized approach where resources from multiple servers are combined to make a single virtual server. In this paper we analyze what techniques can make the probability of failure of the centralized approach approximate the probability of failure of the parallel approach. We identified the probability of Dual In-Line Memory Modules (DIMMs) failure as the key differentiator between the probability of failure of the centralized and parallel systems, and we suggest methods to compensate for DIMMs with high probability of failure.
引用
收藏
页码:534 / 542
页数:9
相关论文
empty
未找到相关数据