Multi-Source Log Parsing With Pre-Trained Domain Classifier

被引:1
作者
Liu, Yilun [1 ]
Tao, Shimin [1 ]
Meng, Weibin [1 ]
Wang, Jingyu [2 ]
Yang, Hao [1 ]
Jiang, Yanfei [1 ]
机构
[1] Huawei Inc, Translat Serv Ctr, Beijing 100085, Peoples R China
[2] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China
来源
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT | 2024年 / 21卷 / 03期
关键词
Semantics; Classification algorithms; Training; Manuals; Task analysis; Maintenance engineering; Labeling; Multi-source log analysis; log parsing; domain classification; transfer learning; deep learning;
D O I
10.1109/TNSM.2023.3329144
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automated log analysis with AI technologies is commonly used in network, system, and service operation and maintenance to ensure reliability and quality assurance. Log parsing serves as an essential primary stage in log analysis, where unstructured logs are transformed into structured data to facilitate subsequent downstream analysis. However, traditional log parsing algorithms designed for single-domain processing struggle to handle the challenges posed by multi-source log inputs, leading to a decline in parsing accuracy. Adapting these algorithms to multi-source logs often requires extensive manual labeling efforts. To address this, we propose Domain-aware Parser (DA-Parser), a framework that includes a domain classifier to identify the source domains of multi-source logs. This enables the conversion of the multi-source log parsing problem into a series of single-source parsing problems. The classifier is pre-trained on a corpus of logs from 16 domains, eliminating the need for additional human labeling. The predicted source domain tags serve as constraints, limiting the template extraction process to logs from the same domain. Empirical evaluation on a multi-domain dataset demonstrates that DA-Parser outperforms the existing SOTA algorithm by 21.6% in terms of parsing accuracy. The proposed approach also shows potential efficiency improvements, requiring only 6.67% of the time consumed by existing parsers, while maintaining robustness against minor domain classification errors.
引用
收藏
页码:2651 / 2663
页数:13
相关论文
共 31 条
[1]  
Brown TB, 2020, ADV NEURAL INFORM PR, DOI [DOI 10.18653/V1/2021.MRL-1.1, DOI 10.48550/ARXIV.2005.14165]
[2]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[3]  
Du M, 2016, IEEE DATA MINING, P859, DOI [10.1109/ICDM.2016.160, 10.1109/ICDM.2016.0103]
[4]   Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis [J].
Fu, Qiang ;
Lou, Jian-Guang ;
Wang, Yi ;
Li, Jiang .
2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, :149-+
[5]   An empirical study of the impact of log parsers on the performance of log-based anomaly detection [J].
Fu, Ying ;
Yan, Meng ;
Xu, Zhou ;
Xia, Xin ;
Zhang, Xiaohong ;
Yang, Dan .
EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (01)
[6]   An unsupervised heterogeneous log-based framework for anomaly detection [J].
Hajamydeen, Asif Iqbal ;
Udzir, Nur Izura ;
Mahmod, Ramlan ;
Abdul Ghani, Abdul Azim .
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2016, 24 (03) :1117-1134
[7]  
Huo YT, 2021, Arxiv, DOI arXiv:2112.12636
[8]   Augmenting Log-based Anomaly Detection Models to Reduce False Anomalies with Human Feedback [J].
Jia, Tong ;
Li, Ying ;
Yang, Yong ;
Huang, Gang ;
Wu, Zhonghai .
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, :3081-3089
[9]  
Jia ZP, 2017, IEEE INT CON AUTO SC, P1136, DOI 10.1109/COASE.2017.8256257
[10]   Log-based Anomaly Detection with Deep Learning: How Far Are We? [J].
Le, Van-Hoang ;
Zhang, Hongyu .
2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, :1356-1367