Multi-Source Log Parsing With Pre-Trained Domain Classifier

被引：1

作者：

Liu, Yilun ^{[1
]}

Tao, Shimin ^{[1
]}

Meng, Weibin ^{[1
]}

Wang, Jingyu ^{[2
]}

Yang, Hao ^{[1
]}

Jiang, Yanfei ^{[1
]}

机构：

[1] Huawei Inc, Translat Serv Ctr, Beijing 100085, Peoples R China

[2] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China

来源：

IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT | 2024年 / 21卷 / 03期

关键词：

Semantics; Classification algorithms; Training; Manuals; Task analysis; Maintenance engineering; Labeling; Multi-source log analysis; log parsing; domain classification; transfer learning; deep learning;

D O I：

10.1109/TNSM.2023.3329144

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automated log analysis with AI technologies is commonly used in network, system, and service operation and maintenance to ensure reliability and quality assurance. Log parsing serves as an essential primary stage in log analysis, where unstructured logs are transformed into structured data to facilitate subsequent downstream analysis. However, traditional log parsing algorithms designed for single-domain processing struggle to handle the challenges posed by multi-source log inputs, leading to a decline in parsing accuracy. Adapting these algorithms to multi-source logs often requires extensive manual labeling efforts. To address this, we propose Domain-aware Parser (DA-Parser), a framework that includes a domain classifier to identify the source domains of multi-source logs. This enables the conversion of the multi-source log parsing problem into a series of single-source parsing problems. The classifier is pre-trained on a corpus of logs from 16 domains, eliminating the need for additional human labeling. The predicted source domain tags serve as constraints, limiting the template extraction process to logs from the same domain. Empirical evaluation on a multi-domain dataset demonstrates that DA-Parser outperforms the existing SOTA algorithm by 21.6% in terms of parsing accuracy. The proposed approach also shows potential efficiency improvements, requiring only 6.67% of the time consumed by existing parsers, while maintaining robustness against minor domain classification errors.

引用

页码：2651 / 2663

页数：13

共 31 条

[11] Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing [J].

Liu, Pengfei ;

Yuan, Weizhe ;

Fu, Jinlan ;

Jiang, Zhengbao ;

Hayashi, Hiroaki ;

Neubig, Graham .

ACM COMPUTING SURVEYS, 2023, 55 (09)

[12] UniParser: A Unified Log Parser for Heterogeneous Log Data [J].

Liu, Yudong ;

Zhang, Xu ;

He, Shilin ;

Zhang, Hongyu ;

Li, Liqun ;

Kang, Yu ;

Xu, Yong ;

Ma, Minghua ;

Lin, Qingwei ;

Dang, Yingnong ;

Rajmohan, Saravan ;

Zhang, Dongmei .

PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, :1893-1901

[13]

Makanju A, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P1255

[14] LogParse: Making Log Parsing Adaptive through Word Classification [J].

Meng, Weibin ;

Liu, Ying ;

Zaiter, Federico ;

Zhang, Shenglin ;

Chen, Yihao ;

Zhang, Yuzhe ;

Zhu, Yichen ;

Wang, En ;

Zhang, Ruizhi ;

Tao, Shimin ;

Yang, Dian ;

Zhou, Rong ;

Pei, Dan .

2020 29TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN 2020), 2020,

[15] A Search-based Approach for Accurate Identification of Log Message Formats [J].

Messaoudi, Salma ;

Panichella, Annibale ;

Bianculli, Domenico ;

Briand, Lionel ;

Sasnauskas, Raimondas .

2018 IEEE/ACM 26TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2018), 2018, :167-177

[16] HuMa: A Multi-layer Framework for Threat Analysis in a Heterogeneous Log Environment [J].

Navarro, Julio ;

Legrand, Veronique ;

Lagraa, Sofiane ;

Francois, Jerome ;

Lahmadi, Abdelkader ;

De Santis, Giulia ;

Festor, Olivier ;

Lammari, Nadira ;

Hamdi, Faycal ;

Deruyver, Aline ;

Goux, Quentin ;

Allard, Morgan ;

Parrend, Pierre .

FOUNDATIONS AND PRACTICE OF SECURITY (FPS 2017), 2018, 10723 :144-159

[17]

Ouyang Long, 2022, ADV NEURAL INFORM PR, DOI [10.48550/arXiv.2203.02155, DOI 10.48550/ARXIV.2203.02155]

[18]

Raffel C, 2020, J MACH LEARN RES, V21

[19] OBJECTIVE CRITERIA FOR EVALUATION OF CLUSTERING METHODS [J].

RAND, WM .

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1971, 66 (336) :846-850

[20] A Primer in BERTology: What We Know About How BERT Works [J].

Rogers, Anna ;

Kovaleva, Olga ;

Rumshisky, Anna .

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 :842-866

← 1 2 3 4 →