Software fault detection and recovery in critical real-time systems: An approach based on loose coupling

被引:6
作者
Alho, Pekka [1 ]
Mattila, Jouni [1 ]
机构
[1] Tampere Univ Technol, Dept Intelligent Hydraul & Automat, FIN-33101 Tampere, Finland
基金
欧盟地平线“2020”;
关键词
ITER; Remote handling; Software; Fault tolerance; Dependability; Real-time; TOLERANCE;
D O I
10.1016/j.fusengdes.2014.04.050
中图分类号
TL [原子能技术]; O571 [原子核物理学];
学科分类号
0827 ; 082701 ;
摘要
Remote handling (RH) systems are used to inspect, make changes to, and maintain components in the ITER machine and as such are an example of mission-critical system. Failure in a critical system may cause damage, significant financial losses and loss of experiment runtime, making dependability one of their most important properties. However, even if the software for RH control systems has been developed using best practices, the system might still fail due to undetected faults (bugs), hardware failures, etc. Critical systems therefore need capability to tolerate faults and resume operation after their occurrence. However, design of effective fault detection and recovery mechanisms poses a challenge due to timeliness requirements, growth in scale, and complex interactions. In this paper we evaluate effectiveness of service-oriented architectural approach to fault tolerance in mission-critical real-time systems. We use a prototype implementation for service management with an experimental RH control system and industrial manipulator. The fault tolerance is based on using the high level of decoupling between services to recover from transient faults by service restarts. In case the recovery process is not successful, the system can still be used if the fault was not in a critical software module. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:2272 / 2277
页数:6
相关论文
共 14 条
[1]   DTP2 control room operator and remote handing operation designer responsibilities and information available to them [J].
Aha, L. ;
Salminen, K. ;
Hahto, A. ;
Saarinen, H. ;
Mattila, J. ;
Siuko, M. ;
Semeraro, L. .
FUSION ENGINEERING AND DESIGN, 2011, 86 (9-11) :2078-2081
[2]  
Alho P., 2013, 4 IFIP TC 10 INT EMB, P262
[3]   Breaking down the requirements: Reliability in remote handling software [J].
Alho, Pekka ;
Mattila, Jouni .
FUSION ENGINEERING AND DESIGN, 2013, 88 (9-10) :1912-1915
[4]   A FRAMEWORK FOR SOFTWARE FAULT TOLERANCE IN REAL-TIME SYSTEMS [J].
ANDERSON, T ;
KNIGHT, JC .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1983, 9 (03) :355-364
[5]   Basic concepts and taxonomy of dependable and secure computing [J].
Avizienis, A ;
Laprie, JC ;
Randell, B ;
Landwehr, C .
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2004, 1 (01) :11-33
[6]   Adaptive fault tolerance and graceful degradation under dynamic hard real-time scheduling [J].
Gonzalez, O ;
Shrikumar, H ;
Stankovic, JA ;
Ramamritham, K .
18TH IEEE REAL-TIME SYSTEMS SYMPOSIUM, PROCEEDINGS, 1997, :79-89
[7]  
Hamilton D., 2011, ITER REMOTE HANDLING, P3
[8]  
Hanmer R., 2010, 17 C PATT LANG PROGR
[9]  
Herder J., 2010, Building a dependable operating system: fault tolerance in MINIX 3
[10]  
KIM BU, 2008, P 5 INT C PERV SERV, P147