FD4C: Automatic Fault Diagnosis Framework for Web Applications in Cloud Computing

被引:48
作者
Wang, Tao [1 ]
Zhang, Wenbo [1 ]
Ye, Chunyang [2 ]
Wei, Jun [3 ]
Zhong, Hua [1 ]
Huang, Tao [3 ]
机构
[1] Chinese Acad Sci, Inst Software, Beijing 100190, Peoples R China
[2] Hainan Univ, Coll Informat Sci & Technol, Hainan 570228, Peoples R China
[3] Chinese Acad Sci, Inst Software, State Key Lab Comp Sci, Beijing 100190, Peoples R China
来源
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2016年 / 46卷 / 01期
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Cloud computing; fault diagnosis; performance anomaly; software monitoring; Web applications; FEATURE-SELECTION; INVARIANTS; ALGORITHMS;
D O I
10.1109/TSMC.2015.2430834
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The large-scale dynamic cloud computing environment has raised great challenges for fault diagnosis in Web applications: First, fluctuating workloads cause traditional application models to change over time; second, modeling the behaviors of complex applications usually requires domain knowledge which is difficult to obtain; third, managing large-scale applications manually is impractical for operators. To address these issues, this paper proposes an automatic fault (F) diagnosis (D) framework for (4) Web applications in cloud (C) computing (FD4C). In this paper, we propose an online incremental clustering method to recognize access behavior patterns. We also use correlation analysis to model the correlations between the workloads and application performance/resource utilization metrics in a specific access behavior pattern. FD4C detects faults by discovering the abrupt changes of correlation coefficients with control charts. Then, FD4C identifies the fault-related metrics using a feature selection method. To evaluate our proposal, we inject typical faults into TPC-W benchmark and apply FD4C to diagnose the injected faults. The experimental results show that FD4C can effectively detect the typical faults and accurately locate the metrics related to the faults.
引用
收藏
页码:61 / 75
页数:15
相关论文
共 46 条
[1]  
[Anonymous], P SIGKDD ACM
[2]  
Barham P., 2004, P 6 C S OSDI BERK CA, V6, P18
[3]   USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING [J].
BATTITI, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :537-550
[4]   Model-Driven System Capacity Planning under Workload Burstiness [J].
Casale, Giuliano ;
Mi, Ningfang ;
Smirni, Evgenia .
IEEE TRANSACTIONS ON COMPUTERS, 2010, 59 (01) :66-80
[5]   Invariants Based Failure Diagnosis in Distributed Computing Systems [J].
Chen, Haifeng ;
Jiang, Guofei ;
Yoshihira, Kenji ;
Saxena, Akhilesh .
2010 29TH IEEE INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS SRDS 2010, 2010, :160-166
[6]  
Chen MikeY., 2004, P 1 C S NETWORKED SY, P23
[7]  
Chengwei Wang, 2011, 2011 IFIP/IEEE International Symposium on Integrated Network Management (IM 2011), P385, DOI 10.1109/INM.2011.5990537
[8]  
Cohen I, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P231
[9]  
Dean D., 2012, ICAC, DOI [10.1145/2371536.2371572, DOI 10.1145/2371536.2371572]
[10]  
Gao Z, 2006, I C DEPEND SYS NETWO, P259