Enhancements to Language Modeling Techniques for Adaptable Log Message Classification

被引：2

作者：

Shehu, Yusufu ^{[1
]}

Harper, Robert ^{[1
]}

机构：

[1] Moogsoft Ltd, Sci Dept, Kingston Upon Thames KT1 1LF, England

来源：

IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT | 2022年 / 19卷 / 04期

关键词：

Task analysis; Analytical models; Anomaly detection; Adaptation models; Training; Data models; Vocabulary; Computer network management; deep learning; transfer learning; root cause analysis; natural language processing; classification algorithms; FAULT LOCALIZATION;

D O I：

10.1109/TNSM.2022.3192756

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Minimizing the resolution time of service-impacting incidents is a fundamental objective of Information Technology (IT) operations. Efficient root cause analysis, adaptable to diverse service environments, is key to meeting this objective. One method that provides additional insight into an incident, and hence allows enhanced root cause analysis, is categorisation of the events and log messages that characterize an incident into pre-defined operational groups. Well established natural language processing techniques that utilize pre-trained language models and word embeddings can be leveraged for this task. The adaptability of pre-trained models to classify log messages, containing large quantities of domain-specific language, remains unknown. The current contribution investigates multiple ways of addressing this deficiency. We demonstrate increased granularity of word embeddings by using character decompositions and sub-word level representations, and also explore the augmentation of word embeddings using features derived from convolutional operations. After observing that the performance of high-specificity models decreases as the number of previously unseen words increases, we explore the circumstances in which we can use a model trained with a low-specificity corpus to correctly classify log messages. Through the application of fine-tuning techniques, we can adapt our pre-trained classifier to classify log messages from service environments not encountered during pre-training in a time, and memory efficient manner. We conclude that we can effectively adapt pre-trained classifiers for impromptu service environments.

引用

页码：4662 / 4675

页数：14

共 57 条

[1] [Anonymous], CISC IOS XR SYST ERR
[2] [Anonymous], 2019, STACK EXCH XML DAT R
[3] Brown TB, 2020, Arxiv, DOI arXiv:2005.14165
[4] Baevski A., 2018, arXiv, DOI DOI 10.48550/ARXIV.1809.10853
[5] Execution anomaly detection in large-scale systems through console log analysis
Bao, Liang
Li, Qian
Lu, Peiyao
Lu, Jie
Ruan, Tongxiao
Zhang, Ke
[J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2018, 143 : 172 - 186
[6] A neural probabilistic language model
Bengio, Y
Ducharme, R
Vincent, P
Jauvin, C
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) : 1137 - 1155
[7] Bojanowski P., 2017, Trans. Assoc. Comput. Linguistics, V5, P135, DOI [DOI 10.1162/TACLA00051, 10.1162/tacl_a_00051, DOI 10.1162/TACL_A_00051]
[8] DEEP LEARNING IN NATURAL LANGUAGE PROCESSING: A STATE-OF-THE-ART SURVEY
Chai, Junyi
Li, Anming
[J]. PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), 2019, : 535 - 540
[9] Chiu Jason P. C., 2016, Named entity recognition with bidirectional lstm-cnns
[10] Event Logs for the Analysis of Software Failures: A Rule-Based Approach
Cinque, Marcello
Cotroneo, Domenico
Pecchia, Antonio
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2013, 39 (06) : 806 - 821

← 1 2 3 4 5 6 →