A cloud-based triage log analysis and recovery framework

被引:10
作者
Qi, Guanqiu [1 ]
Tsai, Wei-Tek [1 ,2 ]
Li, Wu [1 ]
Zhu, Zhiqin [3 ]
Luo, Yong [4 ]
机构
[1] Arizona State Univ, Sch Comp Informat & Decis Syst Engn, Tempe, AZ USA
[2] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing, Peoples R China
[3] Chongqing Univ Posts & Telecommun, Coll Automat, Chongqing, Peoples R China
[4] Huazhong Univ Sci & Technol, Sch Econ, Wuhan, Hubei, Peoples R China
关键词
Log analysis; Production issue triage; Recovery; Big data; Cloud computing;
D O I
10.1016/j.simpat.2017.07.003
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
With the development of cloud infrastructure, more and more transaction processing systems are hosted in cloud platform. Log, that usually saves production behaviors of a transaction processing system in cloud, is widely used for triaging production failures. Log analysis of a cloud-based system faces challenges as the size of data increases, unstructured formats emerge, and untraceable failures occur more frequently. More requirements of log analysis are raised, such as real-time analysis, failure recovery, and so on. Existing solutions have their own focuses and cannot fulfill the increasing requirements. To address the main requirements and issues, this paper proposes a new log model that classifies and analyzes the interactions of services and the detailed logging information during workflow execution. A workflow analysis technique is used to fast triage production failures and assist failure recoveries. The failed workflow can be reconstructed from failures in real-time production servers by the proposed log analysis solution. The proposed solution is simulated by using a large size of log data and compared with traditional solution. The experimentation results prove the effectiveness and efficiency of proposed triage log analysis and recovery solution. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:292 / 316
页数:25
相关论文
共 33 条
[1]  
[Anonymous], 1999, PAGERANK CITATION RA
[2]  
[Anonymous], 2013, INT J SOFTW INF
[3]   Behavioral Log Analysis with Statistical Guarantees [J].
Busany, Nimrod ;
Maoz, Shahar .
2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2016, :877-887
[4]  
Chuvakin A., 2012, Logging and log management: the authoritative guide to dealing with syslog, audit logs, events, alerts and other it 'noise'
[5]  
Curino Carlo., 2011, Relational Cloud: A Database as a Service for the Cloud
[6]  
Deep-software. com, 2017, DEEP LOG AN IIS AP L
[7]  
Deqing Zou, 2014, Network and Parallel Computing. 11th IFIP WG 10.3 International Conference, NPC 2014. Proceedings: LNCS 8707, P446, DOI 10.1007/978-3-662-44917-2_37
[8]  
Fluentd, 2017, OP SOURC DAT COLL
[9]   Fault-Diagnosis for Reciprocating Compressors Using Big Data [J].
Keerqinhu ;
Qi, Guanqiu ;
Tsai, Wei-Tek ;
Hong, Yi ;
Wang, Wenxiang ;
Hou, Guangxin ;
Zhu, Zhiqin .
PROCEEDINGS 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2016), 2016, :72-81
[10]  
Logstash, 2017, LOGST OP SOURC LOG M