Assessment and Improvement of Hang Detection in the Linux Operating System

被引:13
作者
Cotroneo, Domenico [1 ]
Natella, Roberto [1 ]
Russo, Stefano [1 ]
机构
[1] Univ Naples Federico II, Dipartimento Informat & Sistemist, Naples, Italy
来源
2009 28TH IEEE INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS | 2009年
关键词
D O I
10.1109/SRDS.2009.26
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We propose a fault injection framework to assess hang detection facilities within the Linux Operating System (OS). The novelty of the framework consists in the adoption of a more representative faultload than existing ones, and in the effectiveness in terms of number of hang failures produced; representativeness is supported by a field data study on the Linux OS. Using the proposed fault injection framework, along with realistic workloads, we find that the Linux OS is unable to detect hangs in several cases. We experience a relative coverage of 75%. To improve detection facilities, we propose a simple yet effective hang detector, which periodically tests OS liveness, as perceived by applications, by means of I/O system calls; it is shown that this approach can improve relative coverage up to 94%. The hang detector can be deployed on any Linux system, with an acceptable overhead.
引用
收藏
页码:288 / 294
页数:7
相关论文
共 33 条
[1]  
[Anonymous], 2001, P ACM S OP SYST PRIN
[2]  
[Anonymous], P S OP SYST DES IMPL
[3]  
ARLAT J, 1990, IEEE T SOFT ENG, V16
[4]  
BAKER M, 1992, P USENIX SUMM C
[5]  
BUCKLEY MF, 1995, P IEEE INT S FAULT T
[6]  
CHEN W, 2002, IEEE T COMPUTERS, V51
[7]  
CHERKASOVA L, 2008, P IEEE INT C DEP SYS
[8]  
CHRISTMANSSON J, 1996, P IEEE INT S FAULT T
[9]  
Cully B., 2008, P USENIX S NETW SYST
[10]  
DAVID F, 2007, P IEEE INT S DEP AUT