FD4C: Automatic Fault Diagnosis Framework for Web Applications in Cloud Computing

被引:48
作者
Wang, Tao [1 ]
Zhang, Wenbo [1 ]
Ye, Chunyang [2 ]
Wei, Jun [3 ]
Zhong, Hua [1 ]
Huang, Tao [3 ]
机构
[1] Chinese Acad Sci, Inst Software, Beijing 100190, Peoples R China
[2] Hainan Univ, Coll Informat Sci & Technol, Hainan 570228, Peoples R China
[3] Chinese Acad Sci, Inst Software, State Key Lab Comp Sci, Beijing 100190, Peoples R China
来源
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2016年 / 46卷 / 01期
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Cloud computing; fault diagnosis; performance anomaly; software monitoring; Web applications; FEATURE-SELECTION; INVARIANTS; ALGORITHMS;
D O I
10.1109/TSMC.2015.2430834
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The large-scale dynamic cloud computing environment has raised great challenges for fault diagnosis in Web applications: First, fluctuating workloads cause traditional application models to change over time; second, modeling the behaviors of complex applications usually requires domain knowledge which is difficult to obtain; third, managing large-scale applications manually is impractical for operators. To address these issues, this paper proposes an automatic fault (F) diagnosis (D) framework for (4) Web applications in cloud (C) computing (FD4C). In this paper, we propose an online incremental clustering method to recognize access behavior patterns. We also use correlation analysis to model the correlations between the workloads and application performance/resource utilization metrics in a specific access behavior pattern. FD4C detects faults by discovering the abrupt changes of correlation coefficients with control charts. Then, FD4C identifies the fault-related metrics using a feature selection method. To evaluate our proposal, we inject typical faults into TPC-W benchmark and apply FD4C to diagnose the injected faults. The experimental results show that FD4C can effectively detect the typical faults and accurately locate the metrics related to the faults.
引用
收藏
页码:61 / 75
页数:15
相关论文
共 46 条
[41]  
Xiong P., 2013, P 4 ACM SPEC INT C P, P271, DOI [DOI 10.1145/2479871.2479909, 10.1145/2479871.2479909]
[42]  
Xu W, 2009, SOSP'09: PROCEEDINGS OF THE TWENTY-SECOND ACM SIGOPS SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, P117
[43]   Ensembles of models for automated diagnosis of system performance problems [J].
Zhang, S ;
Cohen, I ;
Goldszmidt, M ;
Symons, J ;
Fox, A .
2005 INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2005, :644-653
[44]   Dynamic Coupled Fault Diagnosis With Propagation and Observation Delays [J].
Zhang, Shigang ;
Pattipati, Krishna R. ;
Hu, Zheng ;
Wen, Xisen ;
Sankavaram, Chaitanya .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2013, 43 (06) :1424-1439
[45]   Bench4Q: A QoS-Oriented E-commerce Benchmark [J].
Zhang, Wenbo ;
Wang, Sa ;
Wang, Wei ;
Zhong, Hua .
2011 35TH IEEE ANNUAL INTERNATIONAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), 2011, :38-47
[46]  
Zhang Y., 2014, IEEE T SYST MAN CYB, V44, P1169