Enhancements to Language Modeling Techniques for Adaptable Log Message Classification

被引:2
作者
Shehu, Yusufu [1 ]
Harper, Robert [1 ]
机构
[1] Moogsoft Ltd, Sci Dept, Kingston Upon Thames KT1 1LF, England
来源
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT | 2022年 / 19卷 / 04期
关键词
Task analysis; Analytical models; Anomaly detection; Adaptation models; Training; Data models; Vocabulary; Computer network management; deep learning; transfer learning; root cause analysis; natural language processing; classification algorithms; FAULT LOCALIZATION;
D O I
10.1109/TNSM.2022.3192756
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Minimizing the resolution time of service-impacting incidents is a fundamental objective of Information Technology (IT) operations. Efficient root cause analysis, adaptable to diverse service environments, is key to meeting this objective. One method that provides additional insight into an incident, and hence allows enhanced root cause analysis, is categorisation of the events and log messages that characterize an incident into pre-defined operational groups. Well established natural language processing techniques that utilize pre-trained language models and word embeddings can be leveraged for this task. The adaptability of pre-trained models to classify log messages, containing large quantities of domain-specific language, remains unknown. The current contribution investigates multiple ways of addressing this deficiency. We demonstrate increased granularity of word embeddings by using character decompositions and sub-word level representations, and also explore the augmentation of word embeddings using features derived from convolutional operations. After observing that the performance of high-specificity models decreases as the number of previously unseen words increases, we explore the circumstances in which we can use a model trained with a low-specificity corpus to correctly classify log messages. Through the application of fine-tuning techniques, we can adapt our pre-trained classifier to classify log messages from service environments not encountered during pre-training in a time, and memory efficient manner. We conclude that we can effectively adapt pre-trained classifiers for impromptu service environments.
引用
收藏
页码:4662 / 4675
页数:14
相关论文
共 57 条
  • [1] [Anonymous], CISC IOS XR SYST ERR
  • [2] [Anonymous], 2019, STACK EXCH XML DAT R
  • [3] Brown TB, 2020, Arxiv, DOI arXiv:2005.14165
  • [4] Baevski A., 2018, arXiv, DOI DOI 10.48550/ARXIV.1809.10853
  • [5] Execution anomaly detection in large-scale systems through console log analysis
    Bao, Liang
    Li, Qian
    Lu, Peiyao
    Lu, Jie
    Ruan, Tongxiao
    Zhang, Ke
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2018, 143 : 172 - 186
  • [6] A neural probabilistic language model
    Bengio, Y
    Ducharme, R
    Vincent, P
    Jauvin, C
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) : 1137 - 1155
  • [7] Bojanowski P., 2017, Trans. Assoc. Comput. Linguistics, V5, P135, DOI [DOI 10.1162/TACLA00051, 10.1162/tacl_a_00051, DOI 10.1162/TACL_A_00051]
  • [8] DEEP LEARNING IN NATURAL LANGUAGE PROCESSING: A STATE-OF-THE-ART SURVEY
    Chai, Junyi
    Li, Anming
    [J]. PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), 2019, : 535 - 540
  • [9] Chiu Jason P. C., 2016, Named entity recognition with bidirectional lstm-cnns
  • [10] Event Logs for the Analysis of Software Failures: A Rule-Based Approach
    Cinque, Marcello
    Cotroneo, Domenico
    Pecchia, Antonio
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2013, 39 (06) : 806 - 821