Characterization and Prediction of Performance Loss and MTTR During Fault Recovery on Scale-Out Storage Using DOE & RSM: A Case Study with Ceph

被引:1
|
作者
Kong, Lay Wai [1 ]
Moreno, Orlando [1 ]
机构
[1] Intel Corp, Chandler, AZ 85226 USA
关键词
Storage performance; fault recovery; design of experiment; response surface modelling; Ceph; recovery operation parameters; HDD; SSD; availability;
D O I
10.1109/TCC.2018.2874054
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recognizing the impact from cluster recovery operations on performance and mean time to recovery (MTTR) is essential to maintain service and availability objectives. Testing the impact of recovery operation can reveal a cause-and-effect relationship between recovery parameters and responses of performance and MTTR during the recovery process. This study introduces a combination of systematic methodologies of design of experiments and response surface methodologies to effectively and efficiently find out main and factor-to-factor interaction effects toward the responses. Two Ceph clusters using different storage device technologies, HDD and SSD respectively, were used to characterize the impact of recovery operation on performance and MTTR. The combination of quadratic and linear effects from both Ceph clusters were determined and reported. With 28 tests, MTTR and performance models were developed for each Ceph cluster based on those quadratic and linear effects. These models demonstrate good prediction on performance and MTTR when recovery parameters are adjusted. Using design of experiment and response surface not only allow cause and effect analysis, but also provide the potential inefficient parameter that causes performance loss during recovery. This not only introduces a new method to study cause-and-effect in MTTR but serves as the indicator to areas for improvement for more efficient recovery operation.
引用
收藏
页码:492 / 503
页数:12
相关论文
共 1 条
  • [1] Fault Tolerance Performance Evaluation of Large-Scale Distributed Storage Systems HDFS and Ceph Case Study
    Arafa, Yehia
    Barai, Atanu
    Zheng, Mai
    Badawy, Abdel-Hameed A.
    2018 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2018,