A Search-based Approach for Accurate Identification of Log Message Formats

被引:107
作者
Messaoudi, Salma [1 ]
Panichella, Annibale [1 ]
Bianculli, Domenico [1 ]
Briand, Lionel [1 ]
Sasnauskas, Raimondas [2 ]
机构
[1] Univ Luxembourg, Luxembourg, Luxembourg
[2] SES, Luxembourg, Luxembourg
来源
2018 IEEE/ACM 26TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2018) | 2018年
基金
欧洲研究理事会;
关键词
log parsing; log analysis; log message format; NSGA-II; ALGORITHM;
D O I
10.1145/3196321.3196340
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Many software engineering activities process the events contained in log files. However, before performing any processing activity, it is necessary to parse the entries in a log file, to retrieve the actual events recorded in the log. Each event is denoted by a log message, which is composed of a fixed part-called (event) template-that is the same for all occurrences of the same event type, and a variable part, which may vary with each event occurrence. The formats of log messages, in complex and evolving systems, have numerous variations, are typically not entirely known, and change on a frequent basis; therefore, they need to be identified automatically. The log message format identification problem deals with the identification of the different templates used in the messages of a log. Any solution to this problem has to generate templates that meet two main goals: generating templates that are not too general, so as to distinguish different events, but also not too specific, so as not to consider different occurrences of the same event as following different templates; however, these goals are conflicting. In this paper, we present the MoLFI approach, which recasts the log message identification problem as a multi-objective problem. MoLFI uses an evolutionary approach to solve this problem, by tailoring the NSGA-II algorithm to search the space of solutions for a Pareto optimal set of message templates. We have implemented MoLFI in a tool, which we have evaluated on six real-world datasets, containing log files with a number of entries ranging from 2K to 300K. The experiments results show that MoLFI extracts by far the highest number of correct log message templates, significantly outperforming two state-of-the-art approaches on all datasets.
引用
收藏
页码:167 / 177
页数:11
相关论文
共 36 条
[1]  
[Anonymous], 2008, INTRO GENETIC ALGORI
[2]  
[Anonymous], 2013, Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques, DOI DOI 10.1007/978-1-4614-6940-7_15
[3]   Parameter tuning or default values? An empirical investigation in search-based software engineering [J].
Arcuri, Andrea ;
Fraser, Gordon .
EMPIRICAL SOFTWARE ENGINEERING, 2013, 18 (03) :594-623
[4]  
Baker R., 1995, RANDOMIZATION TESTS, P391
[5]   Scalable offline monitoring [J].
Basin, David ;
Caronni, Germano ;
Ereth, Sarah ;
Harvan, Matúš ;
Klaedtke, Felix ;
Mantel, Heiko .
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8734 :31-47
[6]   Experience Report: Log Mining using Natural Language Processing and Application to Anomaly Detection [J].
Bertero, Christophe ;
Roy, Matthieu ;
Sauvanaud, Carla ;
Tredan, Gilles .
2017 IEEE 28TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 2017, :351-360
[7]  
BESCHASTNIKH I., 2011, P 19 ACM SIGSOFT S 1, P267, DOI [10.1145/2025113.2025151, DOI 10.1145/2025113.2025151]
[8]   Finding knees in multi-objective optimization [J].
Branke, E ;
Deb, K ;
Dierolf, H ;
Osswald, M .
PARALLEL PROBLEM SOLVING FROM NATURE - PPSN VIII, 2004, 3242 :722-731
[9]   Using genetic algorithms for early schedulability analysis and stress testing in real-time systems [J].
Lionel C. Briand ;
Yvan Labiche ;
Marwa Shousha .
Genetic Programming and Evolvable Machines, 2006, 7 (2) :145-170
[10]  
COBB HG, 1993, PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON GENETIC ALGORITHMS, P523