Efficient Analysis of Repairable Computing Systems Subject to Scheduled Checkpointing

被引:16
作者
Mo, Yuchang [1 ]
Xing, Liudong [2 ,3 ]
Lin, Yi-Kuei [4 ]
Guo, Wenzhong [5 ]
机构
[1] Huaqiao Univ, Fujian Prov Univ Key Lab Computat Sci, Sch Math Sci, Quanzhou 362021, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Mech & Elect Engn, Chengdu 611731, Peoples R China
[3] Univ Massachusetts Dartmouth, Dept Elect & Comp Engn, Dartmouth, MA 02747 USA
[4] Natl Chiao Tung Univ, Dept Ind Engn & Management, Hsinchu 300, Taiwan
[5] Fuzhou Univ, Coll Math & Comp Sci, Fujian Prov Key Lab Network Comp & Intelligent In, Fuzhou 350116, Peoples R China
关键词
Repairable computing systems; checkpointing; system reliability; mission completion time; multi-valued decision diagram (MDD);
D O I
10.1109/TDSC.2018.2869393
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
To improve the success probability of a mission execution, scheduled checkpointing is often implemented to save completed portions of the mission task so that a system can resume the mission execution effectively after its restoration whenever the system failure occurs. This paper considers a repairable computing system subject to the scheduled checkpointing. The checkpointing intervals are deterministic, but can be even or uneven. The system repair time is fixed while the system time-to-failure can follow any arbitrary type of distributions. The maximum number of repairs is specified by a certain threshold value. A multi-valued decision diagram (MDD)-based analytical approach is proposed to evaluate the exact success probability of a mission execution for the considered repairable system. The proposed approach enables generating a compact mission MDD model where identical subMDD models can be merged to improve computational efficiency and reduce storage requirement. The MDD model, once being constructed, can be reused for system reliability evaluations using different input parameter values. A benchmark study is presented to show the efficiency of proposed MDD approach. A case study is performed to illustrate the application of the proposed MDD approach to facilitate decision making about proper system design and parameter selection.
引用
收藏
页码:1 / 14
页数:14
相关论文
共 35 条
[1]   Performability Analysis of Multistate Computing Systems Using Multivalued Decision Diagrams [J].
Amari, Suprasad V. ;
Xing, Liudong ;
Shrestha, Akhilesh ;
Akers, Jennifer ;
Trivedi, Kishor S. .
IEEE TRANSACTIONS ON COMPUTERS, 2010, 59 (10) :1419-1433
[2]  
Banks J., DISCRETE EVENT SYSTE
[3]  
BRYANT RE, 1986, IEEE T COMPUT, V35, P677, DOI 10.1109/TC.1986.1676819
[4]  
Chao Wang, 2010, Proceedings 2010 IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS 2010), P524, DOI 10.1109/ICPADS.2010.48
[5]  
Garg R., 2011, INT J COMPUT SCI ENG, V1, P88
[6]   A PSO-Optimized Real-Time Fault-Tolerant Task Allocation Algorithm in Wireless Sensor Networks [J].
Guo, Wenzhong ;
Li, Jie ;
Chen, Guolong ;
Niu, Yuzhen ;
Chen, Chengyu .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (12) :3236-3249
[7]   Optimal Checkpoint Placement on Real-Time Tasks with Harmonic Periods [J].
Kwak, Seong Woo ;
Yang, Jung-Min .
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2012, 27 (01) :105-112
[8]   Dynamic Checkpointing Policy in Heterogeneous Real-Time Standby Systems [J].
Levitin, Gregory ;
Xing, Liudong ;
Dai, Yuanshun ;
Vokkarane, Vinod M. .
IEEE TRANSACTIONS ON COMPUTERS, 2017, 66 (08) :1449-1456
[9]   Optimization of Full versus Incremental Periodic Backup Policy [J].
Levitin, Gregory ;
Xing, Liudong ;
Zhai, Qingqing ;
Dai, Yuanshun .
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2016, 13 (06) :644-656
[10]   Heterogeneous Non-Repairable Warm Standby Systems With Periodic Inspections [J].
Levitin, Gregory ;
Xing, Liudong ;
Dai, Yuanshun .
IEEE TRANSACTIONS ON RELIABILITY, 2016, 65 (01) :394-409