Dynamic Checkpointing Policy in Heterogeneous Real-Time Standby Systems

被引:21
作者
Levitin, Gregory [1 ,2 ]
Xing, Liudong [3 ]
Dai, Yuanshun [1 ]
Vokkarane, Vinod M. [4 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci, Collaborat Auton Comp Lab, Sheng 610051, Sichuan, Peoples R China
[2] Israel Elect Corp Ltd, POB 10, IL-31000 Haifa, Israel
[3] Univ Massachusetts Dartmouth, Dept Elect & Comp Engn, N Dartmouth, MA 02747 USA
[4] Univ Massachusetts Lowell, Dept Elect & Comp Engn, Lowell, MA 01854 USA
基金
中国国家自然科学基金;
关键词
Dynamic checkpointing; warm standby; mission success probability; real-time; optimization; element sequencing; RELIABILITY; RECOVERY; BACKUP; MODELS; TASKS;
D O I
10.1109/TC.2017.2667659
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper models 1-out-of-N standby computing systems with a dynamic checkpointing policy. The system performs a real-time mission task that has to be accomplished within an allowed mission time. During the mission, to facilitate an effective failure recovery the system undergoes checkpointing procedures according to a policy that dynamically determines a checkpointing frequency based on the activated element and remaining work for completing the mission. System elements are heterogeneous; they can follow different, arbitrary types of time-to-failure distributions, have different performance and wait in different standby modes before their activation. A new numerical algorithm based on state space event transitions is first proposed to evaluate mission success probability of the real-time standby systems considered in this work. Additional new contributions are made by formulating and solving optimal dynamic checkpointing policy problems, as well as an integrated optimization problem that finds the optimal combination of checkpointing policy and element activation sequence maximizing mission success probability. Advantages of using the dynamic checkpointing policy over fixed even checkpoints are demonstrated through examples. Examples and results are also provided to illustrate effects of different mission and element parameters on mission success probability as well as on the optimal dynamic checkpointing policy.
引用
收藏
页码:1449 / 1456
页数:8
相关论文
共 43 条
[1]   Reliability Characteristics of k-out-of-n Warm Standby Systems [J].
Amari, Suprasad V. ;
Hoang Pham ;
Misra, Ravindra B. .
IEEE TRANSACTIONS ON RELIABILITY, 2012, 61 (04) :1007-1018
[2]  
[Anonymous], 2008, HDB PERFORMABILITY E
[3]  
Chandy K. M., 1975, IEEE Transactions on Software Engineering, VSE-1, P100, DOI 10.1109/TSE.1975.6312824
[4]  
Chao Wang, 2010, Proceedings 2010 IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS 2010), P524, DOI 10.1109/ICPADS.2010.48
[5]   Availability models with age-dependent checkpointing [J].
Dohi, T ;
Kaio, N ;
Trivedi, KS .
21ST IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 2002, :130-139
[6]   Reliability of a k-Out-of-n System Equipped With a Single Warm Standby Component [J].
Eryilmaz, Serkan .
IEEE TRANSACTIONS ON RELIABILITY, 2013, 62 (02) :499-503
[7]  
Garg R., 2011, INT J COMPUT SCI ENG, V1, P88
[8]   OPTIMUM CHECKPOINT INTERVAL [J].
GELENBE, E .
JOURNAL OF THE ACM, 1979, 26 (02) :259-270
[9]  
Goes P. B., 1997, ORSA J COMPUTING, V7, P269
[10]   STOCHASTIC-MODELS FOR PERFORMANCE ANALYSIS OF DATABASE RECOVERY CONTROL [J].
GOES, PB ;
SUMITA, U .
IEEE TRANSACTIONS ON COMPUTERS, 1995, 44 (04) :561-576