Transparent three-phase Byzantine fault tolerance for parallel and distributed simulations

被引:3
作者
Li, Zengxiang [1 ]
Cai, Wentong [2 ]
Turner, Stephen John [2 ]
Qin, Zheng [1 ]
Goh, Rick Siow Mong [1 ]
机构
[1] Inst High Performance Comp, Singapore 138632, Singapore
[2] Nanyang Technol Univ, Singapore 639798, Singapore
关键词
Parallel and distributed simulation; Byzantine fault tolerance; Replication; Checkpoint; Epidemic effect; Time synchronization; MECHANISM;
D O I
10.1016/j.simpat.2015.09.012
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A parallel and distributed simulation (federation) is composed of a number of simulation components (federates). Since the federates may be developed by different participants and executed on different platforms, they are subject to Byzantine failures. Moreover, the failure may propagate in the federation, resulting in epidemic effect. In this article, a three-phase (i.e., detection, location, and recovery) Byzantine Fault Tolerance (BFT) mechanism is proposed based on a transparent middleware approach. The replication, checkpointing and message logging techniques are integrated in the mechanism for the purpose of enhancing simulation performance and reducing fault tolerance cost. In addition, mechanisms are provided to remove the epidemic effects of Byzantine failures. Our experiments have verified the correctness of the three-phase BFT mechanism and illustrated its high efficiency and good scalability. For some simulation executions, the BFT mechanism may even achieve performance enhancement and Byzantine fault tolerance simultaneously. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:90 / 107
页数:18
相关论文
共 42 条
[21]  
Hannay JE, 2014, NATO MOD SIM GROUP S
[22]  
JEFFERSON DR, 1985, ACM T PROGR LANG SYS, V7, P404, DOI 10.1145/3916.3988
[23]  
KIESLING T, 2003, FAULT TOLERANT DISTR
[24]  
Kotla R, 2009, ACM T COMPUT SYST, V27, DOI [10.1145/1658357.1658358, 10.1145/1323293.1294267]
[25]   THE BYZANTINE GENERALS PROBLEM [J].
LAMPORT, L ;
SHOSTAK, R ;
PEASE, M .
ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 1982, 4 (03) :382-401
[26]  
Lendermann P, 2004, PROCEEDINGS OF THE 2004 WINTER SIMULATION CONFERENCE, VOLS 1 AND 2, P1896
[27]  
Li Z., 2010, P ANN SIM S
[28]  
Li Z., 2010, P 24 WORKSH PRINC AD, P3
[29]  
Li ZX, 2007, IEEE ACM DIS SIM, P113, DOI 10.1109/DS-RT.2007.31
[30]   A Three-phases Byzantine Fault Tolerance Mechanism for HLA-based Simulation [J].
Li, Zengxiang ;
Cai, Wentong ;
Turner, Stephen John ;
Pan, Ke .
14TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL-TIME APPLICATIONS (DS-RT 2010), 2010, :149-158