Transparent three-phase Byzantine fault tolerance for parallel and distributed simulations

被引:3
作者
Li, Zengxiang [1 ]
Cai, Wentong [2 ]
Turner, Stephen John [2 ]
Qin, Zheng [1 ]
Goh, Rick Siow Mong [1 ]
机构
[1] Inst High Performance Comp, Singapore 138632, Singapore
[2] Nanyang Technol Univ, Singapore 639798, Singapore
关键词
Parallel and distributed simulation; Byzantine fault tolerance; Replication; Checkpoint; Epidemic effect; Time synchronization; MECHANISM;
D O I
10.1016/j.simpat.2015.09.012
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A parallel and distributed simulation (federation) is composed of a number of simulation components (federates). Since the federates may be developed by different participants and executed on different platforms, they are subject to Byzantine failures. Moreover, the failure may propagate in the federation, resulting in epidemic effect. In this article, a three-phase (i.e., detection, location, and recovery) Byzantine Fault Tolerance (BFT) mechanism is proposed based on a transparent middleware approach. The replication, checkpointing and message logging techniques are integrated in the mechanism for the purpose of enhancing simulation performance and reducing fault tolerance cost. In addition, mechanisms are provided to remove the epidemic effects of Byzantine failures. Our experiments have verified the correctness of the three-phase BFT mechanism and illustrated its high efficiency and good scalability. For some simulation executions, the BFT mechanism may even achieve performance enhancement and Byzantine fault tolerance simultaneously. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:90 / 107
页数:18
相关论文
共 42 条
[1]  
AGRAWAL D, 1992, 1992 WINTER SIMULATION CONFERENCE PROCEEDINGS, P657, DOI 10.1145/167293.167662
[2]  
[Anonymous], 2000, PARALLEL DISTRIBUTED
[3]  
[Anonymous], 2010, 15162010 IEEE
[4]  
Berchtold C., 2001, P EUR SIM MULT C
[5]  
Bryant E., 1977, SIMULATION PACKET CO
[6]  
Butner Karen, 2010, Strategy & Leadership, V38, P22, DOI 10.1108/10878571011009859
[7]   Practical byzantine fault tolerance and proactive recovery [J].
Castro, M ;
Liskov, B .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2002, 20 (04) :398-461
[8]  
Cayirci E, 2013, WINT SIMUL C PROC, P389, DOI 10.1109/WSC.2013.6721436
[9]   DISTRIBUTED SIMULATION - CASE-STUDY IN DESIGN AND VERIFICATION OF DISTRIBUTED PROGRAMS [J].
CHANDY, KM ;
MISRA, J .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1979, 5 (05) :440-452
[10]  
Chun B.-G., 2008, USENIX ANN TECHN C, P287