Training-free retrieval-based log anomaly detection with pre-trained language model considering token-level information

被引:3
作者
No, Gunho [1 ]
Lee, Yukyung [1 ]
Kang, Hyeongwon [1 ]
Kang, Pilsung [1 ]
机构
[1] Korea Univ, Sch Ind & Management Engn, 145 Anam Ro, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Log data; Log anomaly detection; Retrieval; SYSTEM;
D O I
10.1016/j.engappai.2024.108613
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As the information technology industry advances, the demand for log anomaly detection, based solely on printed log text, is growing. However, identifying anomalies in rapidly accumulating logs remains a challenging task. Traditional anomaly detection models require dataset-specific training, leading to corresponding delays. Notably, most methods only focus on sequence -level log information, complicating the detection of subtle anomalies, and often involve inference processes that are difficult to utilize in real-time. We introduce a new retrieval -based log anomaly detection model, capitalizing on the inherent features of log data for real-time anomaly detection. Our model treats logs as natural language, extracting representations with pre -trained language models. Categorizing logs based on system context, we implement a retrieval -based reformulation to contrast test logs with the most similar normal logs. This strategy not only obviates the need for log -specific training but also incorporates token -level information, ensuring refined detection, particularly for unseen logs. We also propose the core set technique, reducing computational costs for comparison. In our experiments on three representative benchmarks, we obtained an average f1 -score of 0.9738, demonstrating that our model performs competitively with existing models without training on log data. Through various research questions, we verified real -world usability, including real-time detection.
引用
收藏
页数:12
相关论文
共 44 条
[1]  
Bace R. G., 2001, NIST special publication on intrusion detection systems
[2]   Experience Report: Log Mining using Natural Language Processing and Application to Anomaly Detection [J].
Bertero, Christophe ;
Roy, Matthieu ;
Sauvanaud, Carla ;
Tredan, Gilles .
2017 IEEE 28TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 2017, :351-360
[3]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[4]   Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection [J].
Brown, Andy ;
Tuor, Aaron ;
Hutchinson, Brian ;
Nichols, Nicole .
PROCEEDINGS OF THE 1ST WORKSHOP ON MACHINE LEARNING FOR COMPUTING SYSTEMS (MLCS 2018), 2018,
[5]  
Chalapathy R, 2019, Arxiv, DOI arXiv:1901.03407
[6]  
Cheng Q, 2023, Arxiv, DOI arXiv:2304.04661
[7]  
Clark K, 2020, Arxiv, DOI [arXiv:2003.10555, DOI 10.48550/ARXIV.2003.10555]
[8]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[9]   DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning [J].
Du, Min ;
Li, Feifei ;
Zheng, Guineng ;
Srikumar, Vivek .
CCS'17: PROCEEDINGS OF THE 2017 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2017, :1285-1298
[10]  
Du M, 2016, IEEE DATA MINING, P859, DOI [10.1109/ICDM.2016.160, 10.1109/ICDM.2016.0103]