An empirical study of the impact of log parsers on the performance of log-based anomaly detection

被引:16
作者
Fu, Ying [1 ,2 ]
Yan, Meng [1 ,2 ]
Xu, Zhou [1 ,2 ]
Xia, Xin [3 ]
Zhang, Xiaohong [1 ,2 ]
Yang, Dan [1 ,2 ]
机构
[1] Chongqing Univ, Key Lab Dependable Serv Comp Cyber Phys Soc, Minist Educ, Chongqing, Peoples R China
[2] Chongqing Univ, Sch Big Data & Software Engn, Chongqing, Peoples R China
[3] Huawei, Software Engn Applicat Technol Lab, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Log parser; Anomaly detection; Empirical study;
D O I
10.1007/s10664-022-10214-6
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Log-based anomaly detection plays an essential role in the fast-emerging Artificial Intelligence for IT Operations (AIOps) of software systems. Many log-based anomaly detection methods have been proposed. Due to the variety and unstructured characteristics of logs, log parsing is the first necessary step for parsing logs into structured ones in log-based anomaly detection methods. Prior studies have found that the effectiveness of log parsing will impact the performance of log-based anomaly detection. However, few studies comprehensively investigate whether better log parsing implies better anomaly detection. In this paper, we conduct a comprehensively empirical study to investigate the impact of six state-of-the-art log parsers belonging to four categories (including heuristic-based, frequency-based, clustering-based, and subsequence-based) on six state-of-the-art log-based anomaly detection methods (including machine-learning-based and deep-learning-based methods). Experimental results on three public datasets show that (1) High parsing accuracy does not definitely imply high anomaly detection performance. Both parsing accuracy and the number of parsed event templates should be considered when choosing log parsers for anomaly detection. (2) The log parsers have an impact on the efficiency of anomaly detection methods. With the increase in the number of parsed event templates, the efficiency of anomaly detection decreases. In detail, the heuristic-based parsers have less impact on the efficiency of anomaly detection methods, followed by frequency-based parsers. (3) All the anomaly detection methods perform more effectively and efficiently with the heuristic-based log parsers. Thus, the heuristic-based log parsers are recommended for a new practitioner on anomaly detection. We believe that our work, with the evaluation results and the corresponding findings, can help researchers and practitioners better understand the impact of log parsers on anomaly detection and provide guidelines for choosing a suitable log parser for their anomaly detection method.
引用
收藏
页数:39
相关论文
共 58 条
[1]  
Abdi H, 2007, ENCY MEASUREMENT STA, V3, P103, DOI [DOI 10.4135/9781412952644, 10.4135/9781412952644]
[2]  
Babenko A, 2009, ISSTA 2009: INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, P237
[3]  
Berrocal E, 2014, IEEE INT C CL COMP, P1, DOI 10.1109/CLUSTER.2014.6968757
[4]  
Bodík P, 2010, EUROSYS'10: PROCEEDINGS OF THE EUROSYS 2010 CONFERENCE, P111
[5]   Anomaly detection from log files using data mining techniques [J].
Breier, Jakub ;
Branišová, Jana .
Lecture Notes in Electrical Engineering, 2015, 339 :449-457
[6]   An Empirical Study On Leveraging Logs For Debugging Production Failures [J].
Chen, An Ran .
2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2019), 2019, :126-128
[7]   Failure diagnosis using decision trees [J].
Chen, M ;
Zheng, AX ;
Lloyd, J ;
Jordan, MI ;
Brewer, E .
INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING, PROCEEDINGS, 2004, :36-43
[8]   Outage Prediction and Diagnosis for Cloud Service Systems [J].
Chen, Yujun ;
Zhang, Hongyu ;
Yang, Xian ;
Lin, Qingwei ;
Zhang, Dongmei ;
Dong, Hang ;
Xu, Yong ;
Li, Hao ;
Kang, Yu ;
Gao, Feng ;
Xu, Zhangwei ;
Dang, Yingnong .
WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, :2659-2665
[9]   Logram: Efficient Log Parsing Using n-Gram Dictionaries [J].
Dai, Hetong ;
Li, Heng ;
Chen, Che Shao ;
Shang, Weiyi ;
Chen, Tse-Hsun .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (03) :879-892
[10]   Using extended logic programming for alarm-correlation in cellular phone networks [J].
Damásio, CV ;
Fröhlich, P ;
Nejdl, W ;
Pereira, LM ;
Schroeder, M .
APPLIED INTELLIGENCE, 2002, 17 (02) :187-202