共 39 条
[1]
Daly JT(2006)A higher order estimate of the optimum checkpoint interval for restart dumps Fut Gener Comput Syst 22 303-312
[2]
Denning PJ(2005)The locality principle Commun ACM 48 19-24
[3]
Dwork C(1988)Consensus in the presence of partial synchrony J ACM 35 288-323
[4]
Lynch N(2013)A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems J Supercomput 65 1302-1326
[5]
Stockmeyer L(2002)A survey of rollback–recovery protocols in message–passing systems ACM Comput Surv 34 375-408
[6]
Egwutuoha IP(2012)ADFT: an adaptive framework for fault tolerance on large scale systems using application malleability Proc Comput Sci 9 166-175
[7]
Levy D(2015)Fault tolerance on large scale systems using adaptive process replication IEEE Trans Comput 64 2213-2225
[8]
Selic B(2013)Locality principle revisited: a probability–based quantitative approach J Parall Distrib Comput 73 1011-1027
[9]
Elnozahy ENM(2006)Berkeley Lab Checkpoint/Restart (BLCR) for Linux clusters J Phys Conf Ser 46 494-95
[10]
Alvisi L(2001)A statistical approach to predictive detection Comput Netw 35 77-510