Fault-Tolerance Implementation in Typical Distributed Stream Processing Systems

被引:0
作者
Chen, Wuhong [1 ]
Tsai, Jichiang [1 ]
机构
[1] Natl Chung Hsing Univ, Dept Elect Engn, Taichung 402, Taiwan
关键词
distributed stream processing; fault-tolerance; checkpoint; rollback recovery; high availability; ROLLBACK-RECOVERY;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Typical training simulation systems adopt distributed network architecture designs composed of personal computers because of cost, extensibility, and maintenance considerations. In this design, the functions of the entire system are easily affected by failures or errors from any computer during operation. Thus, adopting appropriate fault-tolerance processing mechanisms to ensure that the normal operation and functions of the entire system can be maintained when irregularities occur in a subsystem computer is an important consideration for typical training simulation system design. Since firearms training simulation system operations involve the transmission and processing of substantial amounts of streaming data, these can be considered typical distributed stream processing systems. In this paper, we examined typical distributed stream processing fault-tolerance mechanism designs and technique. We applied this technique to a typical firearms training simulation system to increase the operation reliability and availability. We used the transparent checkpoint method to implement the fault-tolerance mechanism processing program. The results of single-machine fault-tolerance mechanism tests and multi-machine synchronized fault-tolerance mechanism tests indicate that the performance of the checkpoint establishment and rollback recovery time can satisfy the system operation requirements.
引用
收藏
页码:1167 / 1186
页数:20
相关论文
共 19 条
[1]  
[Anonymous], FAULT TOLERANCE SYST
[2]   Fault-tolerance in the borealis distributed stream processing system [J].
Balazinska, Magdalena ;
Balakrishnan, Hari ;
Madden, Samuel R. ;
Stonebraker, Michael .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2008, 33 (01)
[3]   Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems [J].
Brito, Andrey ;
Fetzer, Christof ;
Felber, Pascal .
2009 29TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, 2009, :173-+
[4]   Winckp: a transparent checkpointing and rollback recovery tool for windows NT applications [J].
Chung, PE ;
Lee, WJ ;
Huang, YN ;
Liang, DR ;
Wang, CY .
TWENTY-NINTH ANNUAL INTERNATIONAL SYMPOSIUM ON FAULT-TOLERANT COMPUTING, DIGEST OF PAPERS, 1999, :220-223
[5]  
Danielsen E., INTRO FIREARMS SIMUL
[6]  
Ebnenasir A., SOFTWARE FAULT TOLER
[7]   A survey of rollback-recovery protocols in message-passing systems [J].
Elnozahy, EN ;
Alvisi, L ;
Wang, YM ;
Johnson, DB .
ACM COMPUTING SURVEYS, 2002, 34 (03) :375-408
[8]  
Huang YN, 1998, PROCEEDINGS OF THE 2ND USENIX WINDOWS NT SYMPOSIUM, P47
[9]  
Hunt G, 1999, PROCEEDINGS OF THE 3RD USENIX WINDOWS NT SYMPOSIUM, P135
[10]  
Hwang J., 2008, P 23 INT C DAT ENG, P604