A lightweight software fault-tolerance system in the cloud environment

被引:7
作者
Chen, Gang [1 ]
Jin, Hai [1 ]
Zou, Deqing [1 ]
Zhou, Bing Bing [2 ]
Qiang, Weizhong [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Cluster & Grid Comp Lab, Serv Comp Technol & Syst Lab, Wuhan 430074, Peoples R China
[2] Univ Sydney, Sch Informat Technol, Sydney, NSW 2006, Australia
基金
美国国家科学基金会;
关键词
Software Reliability; Software Self-healing; Cloud Computing; Virtual Machine; Dynamic Instrumentation;
D O I
10.1002/cpe.3190
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
With the development of cloud computing, the demand of high availability for services is growing. Unfortunately, software failures greatly reduce system availability. This paper presents a lightweight software fault-tolerance system, called SHelp, which can effectively recover programs from many types of software bugs in the cloud environment. With error virtualization techniques, it proposes weighted' rescue points techniques to effectively survive software failures through bypassing the faulty path. For multiple application instances running on different virtual machine, a three-level storage hierarchy with several comprehensive cache updating algorithms for rescue points management is adopted to share error handling information. On the one hand, SHelp can reduce the redundancy for multiple application instances; on the other hand, it can more effectively and quickly recover from faults caused by the same bugs. A Linux prototype is implemented on an open-source virtual machine monitor platform, Xen, and evaluated using four Web server applications that contain various types of bugs. The experimental results show that SHelp can recover server applications from these bugs in just a few seconds with modest performance overhead. Copyright (c) 2013 John Wiley & Sons, Ltd.
引用
收藏
页码:2982 / 2998
页数:17
相关论文
共 31 条
[1]  
Almesberger W., 2004, P 2004 LIN S OTT ONT, V1, P9
[2]  
Amazon EC2, 2011, AM EC2 AM EL COMP CL
[3]  
[Anonymous], 2004, OSDI 04
[4]  
Barham P., 2003, Xen and the art of virtualization, V37, P164, DOI [DOI 10.1145/1165389.945462, 10.1145/1165389.945462]
[5]  
Bhatkar S, 2003, USENIX ASSOCIATION PROCEEDINGS OF THE 12TH USENIX SECURITY SYMPOSIUM, P105
[6]   An api for runtime code patching [J].
Buck, B ;
Hollingsworth, JK .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2000, 14 (04) :317-329
[7]  
CASTRO M, 1998, P 3 S OP SYST DES IM, V33, P173
[8]  
CHANDRA S, 2000, THESIS U MICHIGAN
[9]  
Chen G, 2013, SAFESTACK AUTOMATICA, P369
[10]   SafeStack: Automatically Patching Stack-Based Buffer Overflow Vulnerabilities [J].
Chen, Gang ;
Jin, Hai ;
Zou, Deqing ;
Zhou, Bing Bing ;
Liang, Zhenkai ;
Zheng, Weide ;
Shi, Xuanhua .
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2013, 10 (06) :368-379